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lee ENDTRODUCT ION 


A. BACKGROUND 

In almost any organization, one hopes that mca uats at 
high levels of authority are gifted with higher than average 
intelligence. Correspondingly, one would think that, given 
equal work effort, a more intelligent person will advance 
more rapidly than his contemporaries in an organization. 

Gtees met difficult, however, to find examples which 
contradict our perceptions of the role of intelligence in 
career advancement. In almost any field one can remember an 
individual who was not the most intellectually gifted, but 
through hard work and persistence, or other less quantifiable 
traits, advanced equally or better than persons’ of higher 
measured mental ability. There is ample room for other 
influences to overwhelm the value of a person’s intelligence 
in the eyes of a superior. An unattractive personality, an 
inability to apply that intelligence to the tasks at hand, 
and a myriad of other flaws can discredit the merit of raw 
intelligence. 

The degree at which intelligence impacts on advancement 
lies in the area of complex interaction between individuals 
and organizations. It carries with Ce mreh of the 
uncertainty of quantification of human performance. 

Despite ample room for exceptions, the concept of a 


general reward for being more intelligent still seems 
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reasonable. It may be, however, that to clearly see its 
manifestation requires looking at a large number of people 
who have been affected by as similar a set of opportunities 
for advancement as possible. It is the task of this thesis 
to investigate this relationship within a fairly restricted, 
but numerically large population. The population is one 
which has had fundamental raw statistics uniformly obtained, 
and where policies to promote personnel are unambiguous and 


well documented. 


B. PURPOSE 

The purpose of this thesis is to answer a central 
question: Does a significant relationship exist between 
measures of intelligence and academic ability, and an 
individual’s promotion rate as a Noncommissioned Officer? 
Put more simply, does being smarter, as measured by initial 
test scores, or being better schooled, indicate that a person 
will perform better and, hence, advance more quickly than his 
peers? 

The answer to this question has important implications 
for Army policies of recruitment, retention, and promotion. 


It is also a matter of general interest to social scientists. 


C. ORGANIZATION 
This thesis is organized fundamentally as a data analysis 
investigation. Chapters I and II provide preliminary 


information on the nature of the study variables, and briefly 
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review some related articles which have addressed this topic. 
The remaining chapters discuss the analysis of approximately 
forty-thousand Noncommissioned Officer (NCO) records using 
three related approaches. The first approach is. eakieagen ly 
standard procedure of experimental data analysis. This 
procedure begins with analysis of fundamental attributes of 
individual variables, then advances through successive 
increases in dimensionality and complexity. The second 
approach views a subset of the population which distinguishes 
itself by being in the top three percent of the NCO promotion 
rates. Comparison of these top performers to the remainder 
of the population identifies attributes which are found to be 
SHoeniticantly different, and hence, are possibly an 
associated cause for rapid advancement. im ehe third 
approach, the statistical methods of principal components and 
factor analysis are used to provide an alternative method of 
critical variable selection, as well as to lend credibility 


to the results of the other two approaches. 


D. PRELIMINARY INFORMATION 

This section contains an initial discussion about the 
nature of the data, a general overview of the Army NCO 
promotion system, and a synopsis of the analytical tools used 
in this thesis. As previously mentioned, there is a degree 
of looseness in the effectiveness of measurement for 
intelligence and academic data, and also some confounding 
phenomena in Army promotion policy. Early recogni tron of 
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these problems should set the degree of caution which is 
needed in reviewing the subsequent chapters of analysis. The 
section on analytical tools is intended to inform the reader 
of the conditions under which the data analysis was 
conducted, and the hardware and software used. 
bes Intelligence Test Scores 
a. General 


The data for intelligence test scores falls into 


the category sometimes referred to as Defined Measurement. A 
Defined Measurement is one where the property being 
considered cannot be measured directly.(Ref. 1 :p. 6] As a 


result, a related measure is substituted for measurement of 
the actual property. In this case, the property is 
intelligence, and the presumed related measurements are test 
scores from a particular battery of tests. 


The efficacy of intelligence tests as a representative 


measure for intellectual ability is itself an issue 
surrounded by controversy. This controversy has’ been the 
topic of entire books and studies. The testing done by the 


Army is the Armed Forces Vocational Aptitude Battery, or 
ASVAB. Although not designed specifically as an intelligence 
test, the ASVAB does predict general trainability. 
Additional research has shown that the mathematical and 
verbal portions of the ASVAB have a high correlation to the 
ACT,  PSAT, and SAT college entrance examinations.(Ref. 2] 


The ASVAB has been studied, improved, and used for over forty 
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years. Mee Lecents, areicle by Jenson [Ref “Srp. 35], in 


Measurement and Evaluation in Counseling and Development, 
states: 

"To the degree that success in various occupations and 
training programs requires different levels of general 
ability (often called intelligence or 10), an ASVAB 
composite (it hardly matters which one) will be as 
validly predictive as any test now on the market. .. ire 
seems that the new ASVAB-14 is near the limit of 
refinement, psychometrically." 

Generally then, the ASVAB is a well documented and 
established aptitude test. Although the military does not 
specifically attempt to determine the intelligence of its 
potential candidates, academic portions of the ASVAB test 
have shown themselves to be reasonably defined measurements 
of intelligence. 

b. Specific Tests. 

The ASVAB consists of a battery of ten subtests. 
Composites of the subtests of the ASVAB are used to determine 
the overall acceptability of an individual requesting 
enlistment, and for which field he or she would best be 
Suited. From the entire battery of tests, two derived scores 
of intelligence are taken as aggregate measures of 
intelligence. The first is the GT, or general intelligence 


score. This score is the aggregation of three submodules, 


the word knowledge, paragraph comprehension, and arithmetic 


reasoning. The second derived measure of intelligence is the 
Armed Forces Qualification Test Score, or AFQT. This score 
considers four submodules, word knowledge, paragraph 
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comprehension, arithmetic reasoning and numerical 
operations.(Ref. 10:sec 1-0, p. shy An AFQT score is 
reported as a percentile score representing the examinee’s 
relative standing in reference to a specific population. 

There has recently been some additional manipulation of 
the AFQT score. In October of 1984, the reference population 
for assignment of an individual’s AFQT percentile was shifted 
from a base reference population of 1944 to that of 1980. A 
base reference population is a set of values designed to 
represent how the raw AFQT scores of the entire American 
youth population would be distributed. This set of values 
was originally designed in 1944, and had not been updated 
WN es OOK This thesis utilized the 1980 base AFQT 
percentiles. A transformation of test percentiles for 
soldiers who enlisted prior to 1980 was effected by the 
Defense Manpower Data Center (DMDC), and all subsequent 
Department of the Army records have been computed based on 
the 1980 reference. A listing for AFQT percentile 
transformations can be found in APPENDIX A. 

GT scores, which are expressed as the sum of the raw 
test scores, have not been manipulated. However, unlike the 
the case with AFQT score, soldiers have been allowed to 
retake their tests to increase their original GT scores. 
Retesting was introduced in 1982 when a minimum GT score of 


120 was enforced on eligibility for promotion to NCO rank. 
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2. Academic Scores 
a. General 
The data used for academic ability is also a 

defined measurement, Similar to the measures for 
intelligence. Specifically, the property of academic ability 
is being represented by a simple assignment of the number of 
years This value is independent of the quality of 
education, and the grades that any given individual may have 
received. This study assumes that continued attendance and 
progression through the educational system is inherently 
indicative of academic ability. For example, a high school 
graduate has more academic ability than an individual with an 
eighth grade education. The informational value of academic 
scores is thus, not as useful as desired. It is treated in 
analysis as only an ordinal scaled variable. 

bee opecaLic 

Three academic scores are used in the study: 
present education level, education level upon entry into 
Army, and military education since entry. Because advanced 
professional schooling is made available only to those 
individuals who have superior service records, the military 
education score carries with it some additional information 
relative to the performance of the NCO. 
3. Promotion Scores 
Promotion within the Army is a closely Supervised and 


somewhat complicated procedure. It is the product of a 
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considerable number of policies which are not uniformly 
applied across the population. Instead, they are applied 
within rank structure, within career field, or even as a 
function of years of education. Thus, although the 
computation of an individual’s promotion rate is an easy 
task, that value may have been influenced by several policies 
that were peculiar to the individual. 
a. General 


Promotion of NCO’s is governed by Army Regulatic 


AR 600-200. This regulation establishes requirements for 
eligibility, and outlines the process of selection. The 
system views the individual’s performance as’ a whole. This 


includes a composite score based on performance scores, 
commander’s ratings, service awards, and review by a board of 
senior NCO’s. This composite point value is used as a 
threshold value for the Department of the Army to use when 
promoting individuals to the next higher paygrade, as slots 
become available. The slots are accounted for by career 
management field, andas such, the minimum threshold for a 
combat soldier to be promoted may be different than that of a 
Support soldier. A general observation is that career fields 
with more technical orientation have higher promotion point 
thresholds, and subsequently, longer times to advancement 
than those in the larger and less technically oriented career 
fields. 


AR 600-200 also sets minimum times of service and grade 
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which an individual must have served to be considered for 
promotion. Unless superceded by a special policy, the 
shortest period for promotion to E-5 is two years, and is 
four years to E-6. This rate includes waivers — time 
in service and time in grade. Promotion to E-6 in four years 
requires that the individual be advanced to E-5 in two years. 

Dik Specific 

Because of the lack of uniformity of promotion 
within the army population, in this thesis we have taken 
considerable care to identify and address discontinuities 
which would confound promotion based on merit. This includes 
the elimination of some data, and the computation of three 
different promotion rate scores. The governing principle for 
manipulation or restriction of data was to produce a sample 
-population in which each individual started from the same 
point in the rank structure, and had equal opportunity for 
advancement by merit. Chapter [II, Overview of the Data, 
discusses in detail the identified problems and what 
corrective action was taken. 
4. Analytical Tools Used 

This section briefly identifies the hardware and 
software used in analysis. 

a. Hardware 

Computational resources used For. analysis 

included an IBM 3033 System 370 mainframe computer running 


MVS batch system. Additionally, analysis was done for small 
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data sets using a standard IBM microcomputer. 
b. Software 

Two software packages were used for the majority 
of the data analysis. SAS Version 5 was used predominantly 
for analysis resulting in tabular output, such as principal 
components and factor analysis.€Ref. 4,5] Grafstat’ an 
unreleased IBM mainframe data analysis and plotting progran, 
was utilized for analysis requiring graphical output and for 


confirmation of SAS tabular results.(Ref. 6,7) 


E. SUMMARY 

The objective of this introduction has been to adequately 
frame the scope of the topic, and to present sufficient 
background to the reader so that he or she is alerted to some 
of the difficulties inherent in a topic of this nature. 
Also, this will establish a reference for some of the tools 
used to conduct the analysis. 

The length of this section is indicative of the degree of 
preparation required to analyze a relationship which has 
Significant complications in both dependent and independent 
variables. Although the list of assumptions and the 
stripping of aberrant data makes one cautious about the 
reality of such a study, each event should be considered on 


its ability to uncover the answer to the central question of 


this thesis. The central question again is, whether or not a 
Significant relationship exists between measures of 
intelligence and academic ability, and an individual’s 
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promotion rate as a Noncommissioned Officer. [t 23> 1mpertant 
to learn whether measures of intelligence and academic 
ability are important indicators of promotion in the army, 
and if so, how strong that relationship is. Ivé sufficiently 
reliable and believable relationships can be determined, then 
policies could be designed to better identify and develop 
capable individuals for positions of leadership. 

The analysis of this thesis reduced the effects of 
eeouinaring policies, such as discriminatory promotion and 
accession programs. It also used a sufficiently large sample 
size, which allowed the averages to outweigh the exceptions. 
It drew on data from standard personnel records, and made the 


most effective use of that information. 
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Il. “A UREVIEW OF PREVTOUS SiUDEEDS 


The topic of relating intelligence to some aspect of 
performance is an extensive and rich area of study. It” tsea 
particular topic of interest to social scientists and 
military manpower specialists. As a demonstration of the 
quantity of work done in this area, a simple cross- 
referencing of the words intelligence test and performance 
produced a list of 237 citations from the Lockheed’s DIALOG 
online information files. Restriction of available 
references to those utilizing military intelligence test 
scores and statistical analysis of those tests relative to 
some performance measure still results ina large number of 
citations. Within this restriction there is a variety of 
study methodologies. The source of a study can originate 
from an in-house military analysis, a contracted study done 
by a commercial analytical institute, or an academic 
institution making use of military data as its media for 
analysis. 

The nature of the data is also varied. Several studies 
readministered the ASVAB tests to a selected test population, 
other studies used IQ and other intelligence measures in 
addition to the ASVAB. The performance’ side of the 
relationship had an extensive number of dependent variables. 


Examples of performance measures were: results of written 
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exams, military skills test results, minority advancement, 
and comparison to collegiate ACT, PSAT, and SAT tests. 
This chapter will review four of the most closely 
related studies, concentrating for each one on: 
1. The objective of the study. 
2. The methodology used in analysis. 
3. The conclusion reached. 
The first analysis is from Are Smart Tankers Better? 
AbOT and Military Preductivity.(Ref. 8 ] This study is 
essentially an in-house military analysis, the authors being 
Army officers assigned to the Office of Economic and Manpower 
Analysis, at West Point, New York. As described in the 
title, the paper presents the results of an investigation in 
which the crews of tanks were scored on their ability to 
destroy targets on live fire ranges. The AFQT score of the 
gunner and tank commander was one of several explanatory 
variables, having the tank scores as the dependent variable. 
The analysis methodology used a log-log production model with 
ordinary least squares regression. 
The result of their analysis is best summarized in this 
paragraph from the study: 
"Thiest there exists a positive, statistically 
significant relationship between AFQT and performance, is 
a powerful result. The coefficients on the model means 
that if we move, for example, from the AFQT score for an 
average Category IV TC to the AFQT score for an average 
Category IIIA TC, (a 200% increase), we will increase the 


performance on Table 8 (the tank scoring exercise) by 
approximately 20.3%." 
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In this study then, AFQT was found, by means of least squares 
regression, to have a definitive relationship to a well- 
defined skill measure, the conduct of tank firing. 


The second study is an analysis done at the University of 


Iowa by the Cada Research Group titled: On Predicting 
Success in Training for Males and Females; Marine Corps 


Clerical Specialties and ASVAB Forms 6 and 7.(CRef 9] This 


report uses the ASVAB score as an explanatory variable for 
success of recruits in training. The methodology used is 
primarily regression; however, the scope of the regression 
concentrates on identifying differences between male and 
female performance. The implicit result in the study’s 
discussion of the sex score differences is that the 
regressions performed for each category was of useful 
predictive value. An interesting note about this study was 
that the inclusion of high school completion reduces the 
difference between the male and female regression 
coefficients. 


The third study is a section of articles used in the 


Report to the House and Senate Committess on Armed Services, 
Defense Manpower Quality, Volume II, Army Submission. 


C[Ref. 10] The section of interest to this thesis was a study 
done by the U. S. Army Training and Doctrine Command (TRADOC) 
Systems Analysis Activity (TRASANA). The study uses AFQT, as 
well as education level, sex, paygrade, time in service, time 


in Military Occupational Specialty (MOS), and a dummy 
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variable reflecting General Equivalency Diploma (GED) 
completion as explanatory variables. GED is a rating given 
to individuals who did not graduate from high school, but who 
have taken examinations to be rated as equivalent to a high 
school graduate. A battery of tests given under controlled 
conditions resulted in a net score which was made the 
dependent variable. The battery of tests was designed so as 
to represent how proficient a soldier was in his specific 
career field. The test included a written, as well as hands- 
on proficiency test. 

The analysis method used was linear regression, with the 
inclusion of a Durbin Instrument as a correction tool for 
AFQT. The results are again best summarized from the report: 

"The most important result is that AFQT Category I-IIIA 
soldiers performed approximately 10% better overall than 
Pema Soldiers. . . Furthermore, AFQT was a much more 
important influence on performance in virtually all 
instances than either education or experience, whether 
measured in terms of time in service, MOS, or unit. 
Thus, these results strongly Support the validity of AFQT 
as a predictor of performance in these military 
occupational specialties.” 

mits  FGepenrt then, @se very similar in conclusion to the 

tank gunnery repone, in which AFQT was shown through 
regression to have a significant and measurable effect on 
soldier performance in skill related tasks. 

The last study reviewed is also from the collection found 

in the Defense Manpower Study. (Ref. 11] The topic for this 
study was the estimation of promotion rate. It is presently 


the most similar study to the central theme of this thesis. 
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Using AFQT as one of the independent variables, a duration 
model is applied to estimate the expected speed of promotion. 
This model was applied within two categories, the paygrade 
and the career field of the NCOs. This promotion estimation 
study approaches the aggregation of data ina different 
manner as well. Specifically, by evaluating the possibility 
of promotion for each individual over a series of years, the 
dimension of time was entered into analysis. A significant 
advantage of including the time dimension was that changes in 
the categorical levels of the population could be accounted 
for, such as race or sex. 

The methodology used in the promotion estimation study is 
considerably more complex than in the previous studies. 
Rather than uSing standard regression models, the study uses 
the Generalized Linear Model form. Specifically, the form of 
the predictive model is a log likelihood function uSing the 
Weibull shape parameter. The explanatory variables include 
education, AFQT, marital status, race, number of dependants, 
time in service, sex, and high school completion status. By 
using the Weibull model, the application of explanatory 
variables which are not continuous, such as sex, high school 
completion status, and marital status 1S more proper. 
Additionally, there are no requirements for the normality 
assumptions £OG the residuals, and therefore, less 
subjectivity to the appropriateness of the model with respect 


to the independent variables. This method, however, does not 
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consider any in-service information and was calculated only 
for very specific CMF and Paygrade combinations. The results 
are summarized as follows: 


"A review of these promotion results reveals two 


eEends . First, even after controlling for high school 
diploma status, AFQT Category I-IIIA soldiers are 
promoted approximately 10% more rapealyestnan Tis 
soldiers. Second elon sSchoole cComulecion is less 


important than AFQT score in determining promotion rates. 
The remarkable aspect of this last result is that 
educational attainment is an explicit part of the Army’s 
promotion point system, while AFQT scores are not. These 
trends are true for both promotion to E-5 and promotion 
Gene = 1O'.5." 


As considerable attention has already been given to the 
topic of relating measures of intelligence to performance, 
and since positive results have generally been the result, 
one might wonder why another study should be undertaken. 
First, this thesis is in response to a request by the Office 
of the Deputy Chief of Staff for Personnel (ODCSPER) for 
further research in the relationship of AFQT to success in 
the Army. Secondly, this thesis will be different in its 
approach and analytical procedures. Following is a list of 
the unique characteristics of this thesis: 

1. The perspective of this thesis is that the results will 
be used aS a management tool, or aS an explanatory 
method for active duty Army personnel. figehat Praqht, 
Eiemestuday utilizes information collected from the 
jm@edavidual’s in-service record, such as his Skill 
Suesmitication Scores, and his NCO Schooling levels. 
Similar to accession related studies, this analysis 
includes intelligence, academic, and categorical 
Iimnkormatilon as potential explanatory variables. 
However, the intent is not to justify accession of high 
Quality soldiers, but to investigate the trends of 


promotion for active duty personnel as a function of 
available personnel data. 
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This study conducts significant anvestigaciren 
into the data to identify and correct anomalies which 
would confound the relationship in question. 


Statistical analysis is done from the bottom up, 

rather than by direct movement into regression models. 
This approach finds that strict parametric models are 
subject to error due to the inability of some data 
variables to meet distributional assumptions necessary 
for parametric analysis. The study then moves to 
nonparametric means to approach the issue. 


For regression models, given the cautions on their use, 
an additional sample population is tested using the 
model. Thus, the results from the initial model can be 
considered to have more believability and fidelity than 
a model based on analysis of a single population 
sample. 


The use of a large data set.* 


Several explanatory variables have been made 

available from the DMDC data base which have not been 
used in previous’) studies. They include the initial 
education at time of entry, NCO education level, anda 
race variable with six categories. 


The choice of promotion as the dependent variable 
rather than a set of performance tests. Although prone 
to more uncertainty than results of performance tests, 
promotion is in many ways an ultimate performance 
measure. The service, like any other organization, 
recognizes Superior performance by promoting and 
advancing individuals to higher positions of authority. 
As such, promotion rate, despite its problems, has a 
strength of recognition well beyond that of technical 
performance. * 


This study uses graphical methods for depiction of many 
of the methods of analysis. 


*Study number four from Defense Manower Study uses both 


large data sets and promotion as an independent variable. 
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Ili. OVERVIEW OP@THE DATA 


A. INTRODUCTION 


A critical aspect of this thesis was the selection and 


screening of data. Two general guidelines were applied in 
creating the data _ set. First, the data set had to 
demonstrate a level of homogeneity in that the NCO’s 


considered would all have served under similar enlistment and 
advancement policies. Secondly, the selection of individual 
records needed to be random and without unintentional bias to 
meet the requirements for a representative sample. set. 
Section III C. describes in detail the measures taken to 
insure that the above two attributes were established in the 
study data set. 

Recoding of data values into numerical equivalents was 
required for several personnel record fields. As an example, 
the level of Military Schooling, which is the NCO’s in- 
service schooling level, was recorded as mixed alpha-numeric 
characters. Transformation involved rank ordering the 
available levels of schooling in ascending hierarchical order 
and substituting a numeric value for the alpha-numeric value. 
Chapter IV discusses in detail the background of each 
variable. Finally, as a check on the effects of manipulating 
and restricting the sample data set, section III D. provided 
a comparison of statistics for the entire U.S. Army NCO 
database, versus the sample ee set used in this thesis. 
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Bs DESCRIPTION OF THE VARIABLES 

The data variables used in this’ study fall into three 
categories: control variables, intelligence variables, and 
promotion variables. The first two categories, control and 
intelligence, were used as explanatory variables, while the 
promotion variables were used as the dependent variables. A 


brief description of each variable is tabulated in Table l. 


TABLE I Summary of Variables in Sample 


Variable Category Meaning Value Scale 
Dependent 
PRATE Promotion Raw Promotion Rate: 
number of promotions 
per month to most se4i1-.21 Rate 
recent promotion 
RATE Promotion Promotion rate difference 
from average for that 
paygrade (normalized) ~2ee-9.4 “Ratio 
PRA Promotion Promotion rate difference 
from average for that 
paygrade and CMF -3.4-8.0 Ratio 
(normalized) 
Explanatory 
SEX Control Male/Female O/1 Nominal 
CMF Ceoncrol Career Management Field 11-99 Nominal 
RACETH Control Race/Ethnic group ie: Nominal 
PAYGD Control Paygrade 5-7 Ordinal 
GTSCR Intell General Intelligence 
Score 0-160 Ordinal 
AFQTP Intell Armed Forces 
Qualification Test Score 1-100 Ordinal 
Percentile 
OAFQTP Intell Same as AFQTP, referenced 
on 1980 population 1-100 Ordinal 
EIMCAT Intell Mental Category; based 1-8 Ordinal 
on OAFQTP 
HIYRED Intell Highest Year of Education 
upon entry into Army P-12 “Ordanrradi 
EDLVL Intell Present Education Level t-~ 12 Ordinal 
NCOE Intell Military Education Level 
Attained O-13 Ordinal 
PQSCR Pinte ie Army Proficiency Test 0-100 Ratio 
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A more detailed description of each of the study 
Vortableécmeal) be given in the first™part of Ghapter IV, 


Successive Analysis. 


C. PREPARATION OF THE DATA 

Preparation of the data began with acquiring fifty 
thousand records from the U.S. Army Military Personnel Center 
in Alexandria, Virginia. Initial restrictions on the data 
were established to allow inclusion of only NCO’s with a date 
SE enesy after January 1, 1976. Further, NCO’s selected had 
to be members of the Regular Army, and not Reserve or 
National Guard forces. These restrictions provided for 
observation of only those NCO’s-) who were recruited a 
reasonable time period following the ending of the Viet Nam 
War, and following the establishment of the Al1l-Volunteer 
Force. Restricting the NCO’s to Regular Army soldiers 
focused the study on the standing forces alone, and avoided 
confounding as a result of different promotion and accession 
policies in the Reserve and Guard Forces. 

The records requested were randomly drawn by taking every 
fifth individual from an estimated population of 250,000 
meeting the above restrictions. The fifty thousand MILPERCEN 
records were then matched and merged with a similar personnel 
database from the Defense Management Data Center (DMDC) 
Momecerey, California. The DMDC database holds additional 
PairOormaction, including: Pie soit to CiStanguil shen ton 
school equivalent certificates holders from actual graduates, 


or 


the highest year of education of the soldier at time of 
enlistment, and AFQTP and EIMCAT scores renormed for a 1980 
population. 

After the merging, data records which had missing values 
in any of the critical variables fields were dropped. There 
were approximately ten thousand records missing critical 
data. Following initial analysis of promotion rates, two 
additional restrictions were applied against the remaining 
records. 

First, a grouping of several hundred promotion rates 
showed that individuals had been promoted to the rank of E-5 
at rates which were as high as one promotion per month. 
Cross referencing of service numbers identified this sub- 
group as NCO’s who had served in Reserve or Guard units and 
who, for a variety of reasons, had been called for active 
duty. As such, they were allowed by regulation to carry with 
them an accelerated promotion to their former rank. 
Subsequently, a serial number match and elimination was done 
for all NCO’s with recent listing as Reserve or Guard status. 

A second source of unusual promotion rates at the E-5 
level became apparent in some of the more technically 
oriented career management fields, the medical field in 
Particular. Research into Army special recruitment policy 
indicated that during the early 1980’s special provisions 
were made to allow persons with background ability in certain 


technical fields to enter the Army and be promoted to NCO 
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status within six months, or in certain cases to receive NCO 
status immediately following basic training.? To correct for 
these anomalies, all promotion rates which fell outside the 
maximum time periods considering application of Bech waivers 


were discarded. 


D. COMPARISON TO TOTAL ARMY STATISTICS 

In this section, selected attributes of the sample data 
set and the complete U.S. Army database are briefly compared, 
with the intent of checking the representativeness of the 
sample set. 

Population attributes such as distribution of sex, Career 
Management Fields, and paygrade were obtained from the 
complete U.S. Army database records consisting of over 
Z507,000 NCOs. 

As described in paragraph 3.B, the sample data set of 
50,000 selected records had been filtered to contain only 
personnel who entered the Army after 1976. Screening of 
those 50,000 records for completeness of data and uniformity 
of promotion policy, reduced the number in the sample set to 
approximately 38,000. It was prudent then, to check the 
final sample set to see if it retained its representative 
character as a random sample. It should be noted, however, 


that this comparison will not occur for all study variables. 


2>MSG Knopp, NCOIC Defense Management Data Center, West. 
El Estero Drive, Monterey CA 93946. 
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Reasons for this include non-availability of records from the 
MILPERCEN database, and cases where the statistic was 
produced through computation by the author, promotion rates 
being the principal example. 
1. Comparison of Army versus Sample Summary Statistics 

Formal hypothesis testing for means or distributions 
with ANOVA was unavailable due to computational and software 
restrictions. However, since the intent of this section was 
Simply to identify any population shifts, and the magnitude 
of those shifts, observation of summary statistics is assumed 
to be sufficient. Specifically, the means and the standard 
deviations of four variables were obtained from both the 
entire NCO population data set and the thesis sample data 
set. The percent difference between the variable means was 
computed and expressed relative to the thesis sample data. A 
table of comparative statistics and the percent difference is 


shown in Table II. 


TABLE II Total Army vs Sample Summary Statistics 


Total Army Sample 
Sample Size (250,000) (37,854) Percent 
Variable Mean Std Dev Mean Std Dev _ Difference 
AFQTP 48.3 25°72 53.4 "20.9 Sample 10% > 
SEX 1.03 Zoo lalZ w..o2Ze Sample 2.7% > 
RACETH 1.63 991 1.65 .942 Sample 1.2% > 
PAYGD Dao 597 5.27 .464 Sample 5.2% < 


The three variables AFQTP, SEX, and PAYGD have 
noticeable changes between the Sample and the Total Army, 


while the RACETH variable doesn’t appear to have been 
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affected much by sampling. A closer look at the discrete 
distributions, and an overall conclusion about differences in 
the two data sets follows. 
2. Discrete Distributions 

Figures 3.1 and 3.2 illustrate differences in the 
discrete distributions for paygrade and race respectively. 
Both plots are Clustered Bar Charts, and the percentage of 
each level of the discrete variable for both the Total Army 


and the Sample were plotted next to each other. 


ARMY VS SAMPLE PAYGRADE PERCENTAGES ARMY VS SAMPLE RACE PERCENTAGES 


CLUSTER BAR CLUSTER BAR 


80 60 





60 (2 TOTAL ARMY 


CY SAMPLE 40 ( TOTAL ARMY 


( SAMPLE 








40 


PERCENTAGE 


20 
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WHITE BLACK HISPANIC, INDIAN ASIAN OTHER 
PAYGRADE VALUES RACETH VALUES 


Eigure 3. Figure %3..2 
Observation of the tabular data and bar charts show 
that there are some differences between the two populations. 
Specifically, the sample contains more lower ranking 
personnel, slightly more women, and significantly higher 
AFQTP related scores. The racial make-up of the sample 
appears to be similar. 
The restriction of random sampling to only those persons 
entering the service after 1976 can directly or indirectly 
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explain these differences. First, the lower average paygrade 
is a direct result of promotion policy, in which it is 
impossible to achieve a rank above E-7 in less than ten 
years. Hence, the sample population should be demonstrate a 
lower average paygrade. Secondly, the slight increase in the 
proportion of women might be explained by a general opening 
up of the services to women in the late seventies and early 
eighties. Thirdly, the higher AFQTP is a direct result of 
policy restrictions begun in Fiscal Year 1981, and formalized 
by the 1984 Defense Authorization Act. This placed quality 
constraints on AFQT Category and high school diploma status. 
{Ref. 10:secs 1-0, p.ie Whether these restrictions, or the 
general improvement of social acceptance of the military 
services resulted in this AFQT improvement is a question 
which would require significant study in itself. 

In short then, the sample is different in several ways 
from the total NCO population. It should be noted, however, 
that these results are intentional. The shifts caused by 
restricting the sample to after 1976 are felt to be less 
dangerous to the study than the alternative of including 
soldiers who were accessed during the draft and the era of 
Viet Nam War policies. Finally, it is only a matter of time, 
unless significant changes in accession and promotion policy 
occur, before the character demonstrated by the sample data 
set will constitute the norm for allNCOs. Thus, it is 


concluded that the study sample is satisfactory. 
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PV SUCCESS: DATAVANAT ISIS 


oe INTRODUCTION 

Mimcenas chapter the results of ‘a2 systefiatic Wrethod for 
data analysis will be reported. This “method of analysis 
followed a format which is described by Chambers in Graphical 
Methods for Data Analysis.(Ref. 12] This procedure develops 
an understanding of the data, begqamning with simple 
univariate descriptive procedures, then progressing through 
Severalmeincreases in dimensionality of variablés, and finally 
Paice the more complex inferential procedures of model 
building and multivariate regression. An abbreviated outline 
Se ents procedure is shemm below. 
Analysis of single variables. 
Comparison of variable distributions. 
Analysis of paired variables. 
fiat civalLtatce Grapmacam analysis 
Linear Models including: 


a. Simple Regression 
b. Multivariate Models 


OP WON FH 


In addition to these steps, this procedure will be 
supplemented with several non-graphical measures, such as 
ANOVA, ANCOVA, and several tabular nonparametric methods. ite 
should be noted that this analysis reports only those 
procedures which are considered an essential step in 
Uivesetqation, or whose results!’ provided an observation of 
merit. Many available procedures have not been used in this 


chapter, as a consequence of the data failing to meet 
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distributional assumptions, and for other reasons which would 
make such analysis inappropriate. During the development of 
this chapter, the results of each level of analysis will 
specify why the next set of analysis procedures was pursued. 
Alternatively, if a popular class of procedures is 
disregarded, the logic for disregarding is explained. 

The objective of detailing this procedure is to present a 
thorough depiction of the nature of the variables, and to 


explain the development of resulting inferences and models. 


B. UNIVARIATE ANALYSIS. 
1. Dependent Variables 
a. PRATE 

(1) General. The variable PRATE represents the 
raw promotion rate of a particular individual. Numerically, 
it is the total of promotions per month up to the most recent 
promotion. 

(2) Value. The variable PRATE was computed 
using data obtained from the DMCD database. The time to most 
recent promotion in months was found by subtracting the basic 
pay entry date from the date of latest award of rank. This 
number then became the denominator of a ratio having the 
individual’s rank, or equivalently, the total number of 


promotions the individual has received, as the numerator: 


Individual’s Latest Rank 


(Award Date of Latest Rank) - (Date of Entry in Army) 
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Ranks were numerically represented with a score of 5 for 


em E-5 Sergeant, and with 6 “and 7 for values of the next two 


ranks. The resulting units of measurement for the PRATE 
variable were: units of promotion per month of service. 
C3) Attributes of the Variable. The variable 


Peele qualifies as a continuous variable with a ratio scale. 
The continuous nature of the variable relies on the fact that 
the number of months service combined with three rank 
structures yields sufficient combinations of values, actually 
190 in all, to use as measures. 

There are some inherent problems with the raw PRATE 
score, since promotion policies are in effect which set 
minimum time thresholds for promotion. Thus, the promotion 
of an individual who is presently an E-5 will be incomparable 
“26 the promotion rate of an E-7 whose three promotions have 
been affected by the minimum time policy. Generally, the 
Minimum time in service between promotions grows as rank 
increases, and more senior soldiers will normally have lower 
raw promotion rates. 

A second source of bias is potentially found in the 
Career Management Field (CMF) of the soldier. Army promotion 
policy is based on a system of minimum performance points to 
be attained within a CMF in order to be considered for 
Ppromotaon. Generally, the more technical fields will have 
higher promotion point thresholds than non-technical fields. 

The distribution of the variable PRATE and its summary 


o2 


statistics are shown in Figure 4.1. 


histogram is positively skewed, 


ascending slope in the first partitions, then 


flat shape until just past the 


median value, a gradual downward sloping tail 


rough interpretation of this 


be a few individuals who are promoted at 


followed by a block of average promotion 


diminishing tail of individual promotion rates 


the right of the seventy-fifth percentile. 


PRATE HISTOGRAM AND STATISTICS 


demonstrating a 


median value. 


rates, 


The shape of the 


steep 


a generally 


After the 


occurs. A 
shape is that there appears to 


very fast rates, 


then a 


which fall to 


HISTOGRAM TABLE 


a X >PRATE 
eS 7 SELECTION >ALL 
2 X LABEL :PRATE 
< NO. OF ELEMENTS :37854 
X MEAN :0.10946 
S STD. DEVIATION :0.036322 
” SKEWNESS 0.59367 
ra KURTOSIS 2.5854 
8 5—PERCENTILE 0.061225 
N 25-PERCENTILE :0.08 
MEDIAN :0.10204 
2 75—PERCENTILE ‘Olona 
S Q5-PERCENTILE  :0.17857 
X MIN. 0.041667 
a X MAX. 0.20833 
0.04 0.08 O52 0.16 + 0.20 
PRATE 
Figure 4.1 
Distribution transformation of this variable was not 
attempted, primarily because its usefulness in testing or 


modelling is limited by the problems associated with the bias 


factors described above. 
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b. RATE 

Cl) General. The variable RATE is a re- 
expression of the variable PRATE. It has bias due to 
individual rank removed by normalizing each individual score 
relative to his or her paygrade. 

C2) Values. To compute the variable RATE, the 
average PRATE value for each paygrade was calculated, as well 
as the standard deviation for that paygrade. Individual 


scores were then normalized by the transformation: 


RATE: = PRATE: - AVERAGE for that Rank 


STANDARD DEVIATION THAT RANK 


C3) Attributes of the Variable. The variable 
RATE is also a continuous ratio scale variable, as it is a 
transformation of PRATE. 

The removal of influence due to rank was confirmed by 
computing the correlation coefficient between the variables 
RATE and PAYGD. As seen in Table xX, a value of near zero 
resulted where the previous correlation coefficient for PRATE 
and PAYGD had been -.495. Thus, the transformation to RATE 
from PRATE results ina variable independent of PAYGD. 

The distribution shape of the RATE histogram, shown in 
Figure 4.2, appears slightly non-normal, but a check of the 
Summary statistics for quantiles show that they correspond 
closely to the standard normal quantiles. Thus, the 


assumption of normality for procedures using this variable is 
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still reasonable, based on observation of 


the divstribucion 
shape and the close agreement of quantile values. 


Figure 4.2 presents a histogram and summary statistics for 


the RATE variable. 


RATE HISTOGRAM AND STATISTICS 


HISTOGRAM TABLE 


2 =o xX sRATE 

S ere SELECTION PALL 
X LABEL :RATE 

8 NO. OF ELEMENTS’ :37854 

< Xx MEAN --1,585E76 
STD. DEVIATION :0.99997 

S SKEWNESS -0.21408 

a KURTOSIS :2.3767 
Zs S-PERCENTILE _:71.5476 

o 25—PORCENTILE  :70.77578 
MEDIAN :-0,03757 

8 75-PERCENTILE  :0.70754 

z Q5-PERCENTILE :1.€234 
X MIN. :-2,2681 

° X MAX. :3.6685 

—2 0 a , 
RATE 
Figure 4.2 
c. PRA 
(1) General. The variable PRA is another 
recomputation of the raw promotion rate. PRA controls for 


the career management field as well as paygrade. 


of normalized promotion scores, which 


PAYGD and CMF. Verification of the independence 


these variables was 


coefficients. Both variables CMF and PAYGD 


values of correlation with PRA. 


(2) Values. 


in the same manner as in RATE, however a mean 
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had 


It is set 


are independent of 


of PRA from 


also confirmed by checking correlation 


neal zero 


Computing the variable PRA was done 


and standard 


deviation for each CMF and PAYGD combination was computed and 
used in the normalization equation. . 

C3) Attributes. PRAP is a continuous variable 
with a ratio scale. The distribution of PRA appears normal, 
with the quantile values very close to the standard normal. 


A comparison of percentile values for PRA versus the standard 


normal are shown in TABLE III. 


PRA HISTOGRAM AND STATISTICS HISTOGRAM TABLE 
(N=37854) X :PRA 
is SELECTION PALL 
2 X LABEL PRA 
NO. OF ELEMENTS :37854 
X MEAN a7 41-9 
= STD. DEVIATION :0.99881 
- 3 SKEWNESS :0.21406 
= KURTOSIS 72.6652 
5—ERCENTILE fa 55 15 
° 25-PERCENTILE :70.75252 
8 MEDIAN :70.04146 
75-PERCENTILE :0.69604 
Q5-PERCENTILE :1.7086 
X MIN. :-3.4988 
= =z 3 i a X MAX. 24.5374 


Figure 423 
A comparison of percentiles for the PRA distribution 
versus the standard normal distibution is shown in Table III. 
Specifically, the PRA percentile values are listed with the 
corresponding standard normal percentile values for the same 
data point. For example, -1.5510 is the PRA five percentile, 
while a -1.5510 indexed in a standard normal table results in 


a six percent value. 
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TABLE III. Comparison of PRA vs 
Standard Normal Percentiles 


Standard Normal 


6% 
22.6% 
48.4% 
75.7% 
9O2Gs 





Normality for this variable will be assumed based on 
general distribution shape and the close correspondence of 
the data percentiles to the standard normal percentiles. 

2. Control Variables 

dd... SE x 
The variable SEX is discrete and nominal. Males 
are represented by a numerical value of one, and females are 
represented with a two. In the study sample, 12.29 percent 
of the sample was female, and 87.71 percent were male. 
e. CMF 
Career Management Field (CMF) is a discrete 
variable with nominal scale. Thirty three CMF’s = are 
represented in the sample. Each Career Management Field is 
assigned a numerical value, for example, the Infantry branch 
is designated as CMF 11. These assignments are a Department 
of the Army numbering system, and can be reviewed along with 
the CMF percentage and frequency table in Appendix A. 
There is some ordinal information in the numbering 


system, for instance, low CMF numbers are indicative of a 
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combat branch, such as Infantry or Armor. Center CMF values 
are indicative of combat support branches, such as Signal Ei 
Chemical. Upper CMF values are from the combat service 
Support branches, guch as Medical and Language EB eciaiiet. 
Figure 4.4, the CMF histogram, does reflect the 
distribution of the three general groupings of CMF densities: 
combat, combat support, and combat service support. The 
combat and combat support values have roughly equivalent 


representation, while the upper numbered service support 


CMF’s are about two thirds the size of the other groups. 


CMF HISTOGRAM 
(N=37854) 


8000 


COMBAT COMBAT SPT COMBAT SVC SPT 


6000 


NO OF SAMPLES 
4000 





© 
o 
N 
Oo 
20 40 60 89 100 
CMF 
Figure 4.4 
£. _RACETH 


The race-ethnic variable is a discrete, nominal 
variable. The values represented and their percentages are 


shown in table IV. 
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IV Sample Race Percentages 


Percent Cumulative 
Percent 


White 52.43 52.43 
Black 38.59 SiO 
Hispanic SoG 96.6 
American Indian/Alaskan Native e265 96.86 
Asian/Pacific Islander 1: les 98.01 
Other/Unknown 1.99 100.00 





g. PAYGD 
Paygrade is a discrete, nominal variable. The 
selection of NCO rank from personnel enlisting after 1976 
resulted in representation by paygrades E-5 through E-7 only 


The distribution of PAYGD is shown in Table V. 


TABLE V Sample Paygrade Percentages 


Rank Percentile Cumulative 
Percent 


Sgt E=5 T3322 73229 
Staff Sergeant E-6 25.89 99.19 
SFG.) Baw Oe 100.00 





The 0.81 percent for E-7 results in only 307 SFC’s in the 
sample. Despite the preponderance of representation by the 
other ranks, a sample size of 307 for the E-7 rank still 


allows for adequate representation of that subcategory. 
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SF Intelligence and Academic Scores 
h. GTSCR 

The General Intelligence Test Score (GTSCR) of 
the individual is a continuous variable with at least an 
ordinal scale. The range of values run from 50 through 160. 
The lower value of 50 represents the corresponding minimum 
score of ASVAB modules that would allow for enlistment in the 
Army. The histogram of the GTSCR variable, shown in figure 
4.5, is approximately normal. Checking the quantiles shows a 
larger density in the distribution to the left of the mean, 


with slightly lower values for quantiles right of the mean. 


ABLE 
GTSCR HISTOGRAM AND STATISTICS HISTOGRAM T 


x -GTSCR 
(NS 185@ SELECTION FALL 
X LABEL :GTSCR 
NC. OF ELEMENTS :37854 
& X MEAN 108.23 
n © STD. DEVIATION :14.275 
. SKEWNESS 0.129 
a 8 KURTOSIS :3.3632 
” § 5—PERCENTILE -B4 
5 25-PERCENTILE  :99 
= MEDIAN :109 
S 75S ORGEN ieee 17 
: Q5-PERCENTILE  :130 
X MIN. 54 
Pa X MAX. 156 
60 80 100 120 140 169 
GTSCR 


Figure 4.5 
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i. (eeo Te 

The Armed Forces Qualification Test Percentile is 
a continuous variable with ordinal scale. Its value 
represents the relative standing of an individual’s test 
score referenced against a 1944 population. This means that 
an individual’s raw AFQT score 1s compared against a standard 
table of values that was developed in 1944. This table of 
values from 1944 was designed to represent the distribution 
of raw AFQT test scores for the entire 1944 American youth 
population. Hence, a resulting individual AFQT score is 
Simply the corresponding percentile of the individual raw 
AFQAT score relative to the entire 1944 population AFQT test 
distribution. 

The histogram and summary statistics for AFQTP are shown 
in Figure 4.6. The density of AFQTP is partially symmetric 
about the mean. The lower five percent quartile is ata 
value of 21, demonstrating the restriction applied to CAT V 
and VI personnel since 1980. Use of the AFQT score for this 
study is primarily for comparative reasons. AFQT cannot be 
used in any developed model since scoring against the 1944 
reference population has _ ceased. As will be seen in 
subsequent chapters, AFQT was discarded anyway when OAFQT 


proves to a better explanatory variable. 
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AFQTP HISTOGRAM AND STATISTICS HISTOGRAM TABLE 


(N=37854) X DAP QP 
: SELECTION ALL 
X LABEL >AFQTP 
aa NO. OF ELEMENTS :37854 
n X MEAN 753.419 
a STD. DEVIATION :20.965 
3 SKEWNESS Oe 209 13 
AS KURTOSIS :2.2128 
5% 5-PERCENTILE a2 
O 2o—F PR ee wl ee Oe 
“s MEDIAN :50 
- 75-PERCENTILE :68 
95-PERCENTILE aon 
X MIN. 210 
© ‘ 
20 40 S a0 Ap X MAX. 799 
AFQTP 
Figure 4.6 
j.- OAFQTP 
The OAFQTP variable iS a continuous variable with 
ordinal scale. It is fundamentally the same as the AFQTP 


variable, excepting the reference for measurement, which is a 
1980 population. The distribution for OAFQTP is considerably 
more dense in the lower values than AFQTP. Explanation of 
this shift can be seen by reviewing the transformation tables 
in Appendix A for converting 1944-based scores to 1980 
scores. The transformations for values below 80 result ina 
1944 based score to be reduced in almost every case. The 
amount of reduction varies, but it can be as much as four 
points. Only when the eee go above 85 are there any 


increasing transformations. 
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OAFQT HISTOGRAM AND STATISTICS , HISTOGRAM TABLE 


i : OAFQTP 
2 Paes SELECTION ZALL 
2 X LABEL : OAFQT 
NO. OF ELEMENTS :37854 
- X MEAN :45.319 
ns STD. DEVIATION :24.779 
a SKEWNESS :0.53139 
SO KURTOSIS en 25 
tas S-FERCENTILE 314 
a 25-PERCENTILE :25 
= o M=2DIAN :41 
8 75-PERCENTILE :64 
- Q5-PERCENTILE  :92 
X MIN, 27 
7 X MAX, :99 
0 20 40 60 60 100 
OAFQT 
Figure 4.7 
k.  EIMCAT 


EIMCAT is the mental category of an individual 
based on the 1980 reference population AFQT test score. 
EIMCAT is a discrete and ordinal scale variable. The 
assignment of categories is a Department of Defense standard, 
and is a common reference for all services. The breakdown of 


values is as follows: 


TABLE VI Sample Mental Category Percentages 


Value Category AFQT Percent Cumulative 
/ Percent 
al Cat V 01-09 we de 
Z Cat IV C 1O S185 6.4736 Ta0C/ 
3 Cat IV B 16-20 9.788 16.854 
4 Cat IV A Zao P Seeker 36.041 
S Cat III B 31-49 26.116 G2. 157 
6 Cat III A 50-64 1G. O0S6 7 ome 
7 Cat II 65-92 19.99 95.2 
8 Cat. 93-99 4.8 100.000 


a0 


A histogram of the EIMCAT values follows in Figure 4.8. 


SAMPLE EIMCAT DISTRIBUTION 
BAR CHART OF PERCENT 
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Figure 4.8 
Observation of the above figures demonstrates more 


Clearly the fact that categorization into EIMCAT category is 
not evenly distributed across the scale of OAFQT scores. For 
example, the center EIMCAT, value five, spans almost twenty 
points, while EIMCAT eight contains only the upper seven 
point scores. EIMCAT does make available an established, 
discrete scale measurement representing intelligence test 
scores for use in appropriate statistical procedures. 
Ite HLYRED 

HIYRED is the highest year of education held by 
the individual upon entry into the army. It is a discrete 
and ordinal scale variable. The values and distribution 


percentages are shown on the next page in Table VII. 
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TABLE VII Sample Highest Year of Education 


Category Percent Cumulative 
Percent 


1-7 Years OZ Ore ~O18 
8 Years OTS 3 mleg 2 
1 Year High School P39 7 - 569 
2 Years High School 4.7 . 269 


3-4 years HS (no diploma) 6.29135 Og 
High School GED 4.813 

High School Diploma 71.274 

1 Year College JOS 

2 Years College 3.453 

3-4 Years College (no degree) 1.337 

College Graduate 2.560 

Masters or Equivalent 0.05 

Doctrate or Equivalent O US 


OOMON AMNUAWNDH HE 





m. EDLVL 
EDLVL is the present level of education for the 
individual. These scores are related to HIYRED, in that any 
education taken by the individual subsequent to enlistment is 
recorded in this variable. A GED equivalency is included as 


a value of six for high school completion. 


TABLE VIII Sample Education Level Percentages 


Value Category Percent Cumulative 
Percent 


.042 .042 
Slee a -053 
TLS web 
«fas .043 
2oOS -547 
- 443 5 ele, 

.089 nO 
-828 aor, 
ZOO 7 -944 
.948 -829 
1 aoe 
.008 


1-7 Years 

8 Years 

1 Year High School 

2 Years High School 

3-4 years HS (no diploma) 
High School Diploma 

1 Year College 

2 Years College 

3-4 Years College (no degree) 
College Graduate 

Masters or Equivalent 
Doctors or Equivalent 


i 
2 
3 
4 
5 
6 
7 
8 
9 


OONNUYAOrF OOOO 
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Observation of Figure 4.9, or percentages in Table VIII, 
shows an observable upward shift of education level after 
enlistment. This is. possible, and encouraged with’official 


continuing education and high school completion programs. 


HIYRED AND EDLVL PERCENTAGES 
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Figure 4.9 
n. NCOE 
The Noncommissioned Officer Education variable, 
NCOE, is adiscrete and ordinal scale variable. It reports 


the level of military schooling accomplished by the 
ncaa iua |, Military schooling categories are generally 
organized in three ascending levels: primary, basic and 
advanced. At the two lower levels, primary and basic, there 
are seperate courses for combat and non-combat CMF’s. In 
some cases, there has been an award of an On-The-Job Training 
qualification. The OJT award is used to give credit to an 


NCO who can achieve technical competence in advance of being 
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eligible for promotion to the next higher paygrade. 

As previously mentioned, attendance at military schools 
is sometimes associated with an individual being previously 
identified as a superior performer. This is true mostly in 
the advanced level schools where selection for attendance is 
through Department of the Army Selection Boards. At the 
primary level, local commanders have authority to establish 
selection procedures and often will make primary school 
attendance a locally mandatory requirement for junior NCOs. 
Table IX and Figure 4.10 demonstrate the categories and 


distribution of NCOE. 


TABLE IX Sample NCOE Percentages 


Category Percent Cumulative 
Percent 


Nonparticipant ce oles 
Primary NCO Course (CBT CMF) 4. 20 
Primary Leadership Graduate 39@ 65; 
On-The-Job Credit for E-5 skills ous 10% 
Primary Technical Course Graduate 2. 7 Bae 
On-The-Job Credit for E-6 skills OF es 
Basic Technical Course Graduate Be 7 Ox 
Basic NCO Course (CBT CMF) 1S 3 94. 
On-The-Job Credit for E-7 skills : 94. 
Advanced NCO Course Selectee es 96. 
Advanced NCO Course Graduate O. 99. 
Advanced NCO nongraduate, OJT 

On-The-Job Credit for E-8 skills 


0 
iL 
Zz 
2 
4 
5 
6 
7 
8 
9 





Figure 4.10. presents a histogram of NCOE discrete levels. 
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SAMPLE NCOE SCHOOLING PERCENTAGES 
BAR CHART 
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PQSCR is a report of the Primary Military 
Occupation Skill Qualification Test Score (SQT) of the 
inadiypdua le It is a continuous and ratio-valued variable. 
The SQT is a service related test which is used to determine 
the technical competence of a soldier. SQT score has been 
used by promotion boards as a qualitative measure for 
PEOMmeclon . The numerical value represents the percent of 
correct answers on a written and hands-on evaluation. 
Separate SQT tests are written for each CMF, although the 
structure of the tests are similar. 

The distribution of PQSCR, shown in Figure 4.11, 18 more 
dense in the upper values, with an abnormally long left tail 
extending to a lower bound of 21. An explanation for the 
Shape of the PQSCR distribution is an involved topic, and has 
itself been the subject of study. A general observation is 
that PQSCR has previously been used in a manner where 


SIS, 


individual soldier scores were often aggregated as a means of 
comparison of the parent unit of the soldiers.(CRef. ll:p. 4] 


Thus, significant units and individual training emphasis has 


been focused on SQT testing in previous years, and pressure 


to perform well was influenced by the parent organizations. 


As a result, a positively skewed distribution, rather than a 


normal distribution, is understandable. 


PQSCR HISTOGRAM AND STATISTICS HISTOGRAM TABLE 


es :PQSCR 
(N=37854) SELECTION PALL 
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Figure 4.11 


Se Summary 


The fifteen variables used in this study demonstrate 


a wide variety of characteristics. All of the dependent 
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variable choices were continuous with two, RATE and PRA, 
showing only slight departures from normality. The other 
continuous variables did not have identifiable distributions, 
and could not be transformed to normality using power or log 
transformations. Nor is it entirely clear that one would need 
to use a transformed variable in subsequent analysis. 

The independent variables compris of a mixture of 
continuous and discrete values, with both ordinal and ratio 
scales. Within the independent variables there are two 
principal sets of related variables. The intelligence test 
scores, AFQTP, OAFQTP, EIMCAT, and to a lesser extent GTSCR, 
are all derived from the ASVAB. These variables differ from 
one another in varying degrees, and are either ae re- 
expression, transformation, or a similarly derived set of 
scores. 

The two academic performance measures, EDLVL and HIYRED, 
are related, in that EDLVL is simply the addition of 


additional schooling since entry into the Army. 


Despite the Similarities within these two sets of 
variables, it is felt that sufficient differences in 
informational value are present in each expression. Further, 


since the variables used are all standard data collection 
items for the DMDC database, each variable expression will be 
studied. The relative merit of any single or combined 
variable from this study may be useful to managers seeking 


appropriate data sources for other studies. 


De. 


An important result of the analysis of these study 
variables is the observation that many of the necessary 
assumptions for standard parametric hypothesis testing, 
Analysis Of Variance (ANOVA), and possibly regression will 
not be met. These include assumptions about the form of the 
distribution as well as the scale of the variable. In this 
study, analysis will initially seek to use standard 
parametric methods. However, if results of the analysis are 
sensitive to distributional or scale assumptions, those 
assumptions will be checked. If examination of assumption 
requirements fails, or if there is a nonparametric test of 
Similar efficiency, nonparametric tests will be conducted as 


a replacement or as a confirmatory precedure. 


C. BIVARIATE ANALYSIS 
This section will concentrate on identifying 
relationships between pairs of variables, and in identifying 


shifts in distribution as a Eunction “of the effects, “or 


categorical, variables. Three methods of analysis will be 
used in this section. The first method is analysis of 
association using a matrix of Pearson product-moment 
correlations. This will provide intital information as to 


the strength of association between any two variables, and 
the direction of that relationship, being either positively 
or negatively correlated. The second method will be analysis 
of scatterplots of pairs of variables, using the techniques 
of LOWESS and Jittering to better view any trends in the 
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variables. This method will give initial information on what 


type of fitted line, and hence what mathematical 
relationship exists between independent and dependent 
variables. Of significant interest will be whether the 


relationship is fundamentally linear, or whether it is 
possibly polynomial or curvilinear. The third and final 
method used will be analysis of three-dimensional empirical 
distribution plots. This will demonstrate some shifts in 
distribution within several of the effects variables. 
1. Correlation Matrix 
As earlier mentioned, the purpose of reviewing the 
Pearson product-moment correlation matrix is to identify 
pairs of variables which have a strong association. The 
panmge OL the Correlation coefficient, rho, is from -1 to +1, 
-and a value of zero indicates that the variables have no 
linear association with each other. A value of +1 indicates 
an exact direct linear relationship, while a -l indicates an 
exact inverse linear relationship. This measurement of 
association is not completely indicative of dependency, and 
is only a preliminary tool to identify candidate variables 
for testing and subsequent inferential statistics. 
Remembering the central question of this thesis, the most 
important pairs of variables will then be any of the 
intelligence and academic scores paired with the promotion 
rate variables. Of almost equal interest will be any 


interval scale effects variables demonstrating a strong 


a9 


linear relationship with the promotion variables. 

The strength of the linear relationship between two 
variables, or its level of significance, is based on how much 
variance there is in the estimated value of rho. Further, 
the variance of rho is dependent on the sample size being 
considered. For example, if the sample size were small, and 
the value of rho had a standard deviation of plus or minus 
.3, then a large positive or negative value of rho would be 
needed to effectively demonstrate significance. Conversly, 
for a large sample set with very small standard deviation for 
rho, a much smaller rho value could be considered 
Signiticant. An estimate for the standard deviation of rho 
can be found by computing the inverse of the square root of 
the sample size. Considering the thesis sample size of 
- 37,854, the resulting estimate of the standard deviation of 
rho is .005139. Thus, a value ef rho different from Zerousy 
plus or minus .O1, could be considered significant. 

In Table X the complete Pearson product-moment 
correlation matrix for the study variables is given. The 
Pearson product-moment computation is a parametric method and 
assumes pairs of normal and continuous variables. This is 
the preferred method since we are primarily interested in 
correlations with either the RATE or PRA variable as one of 
the pair of variables. Additionally, it is possible, using 
the Spearman nonparametric method, to compute a correlation 


value rho for pairs of ordinal,:-or higher scale variables. 


60 


(Ref. 13:pp. 251-253] The Spearman method is a distribution 
free method providing correlations based on the ranks of the 
variables. The last column on the second part of Table X 
lists the correlations computed using the Spearman method. 
Comparison of Spearman versus Pearson values’ showed that 
there was an acceptable correspondence between the _ two 
methods, and Pearson values are used exclusively to simplify 
analysis. 

Even with application of both the Spearman and Pearson 
methods there remained several pairs of variables which did 
not meet the assumed distributional characteristics for 
correct interpretation of the rho value. These variables are 
the discrete, nominal variables SEX, RACETH, and possibly 
Chte Their results are included in Table X, but any 
interpretation of the rho value would be ineffective. The 
most important rho values in Table X are located under the 


PRA column and are underlined. 
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TABLE X Pearson Correlation Coefficients 


PRATE S RATE PRA GTSCR AFQTP OAFQTP EIMCAT PQSCR 


PR ae OC 
RATE wee 
PRA a7 ae 


PEARSON SCOEFFICIENTS CONTINGED SPEARMAN 
PAYGD HiYRED EDLY EeeNCoOE SEX RACETH PRATE 


PRATE. =. 495 el Sees ~ 200 O13 -064 1.000 
RATE —-. 000 = [6 Smuts . 047 oes : .084 .808 
PRA COO Mio Seto CCS - 036 -056 Jia 
GTSCR  .143399. 21C 3226 -OS2 .054 ~-242 - 020 
AROQTP .087 “22is) facoo ee oe : 306 - O77 
OAP OTP .031 245552266 . 060 .049 ° oe - lO 
EIMCAT 023” 320 Sie 242 -062 -O6s ones los 
HIYRED JCOMMEAOOS  .708 -C62 Se 024 ~147 
EDEVE. 4.098. 7/ComiL. 000 .004 -114 SO -CSe 
NCOE -433 ~.063° 2004 OOO /Oer ‘ —O15 - 206 
SEX wees? .«bSLE wala .O81 COG : .042 020 
CMF = 055 «L465 2.1977 - 184 -256 2 .069 
RACETH 22046 7024 57033 O25 .-042 .000 . 092 
PAYGD E000 §.000 09a 432 : : eae OaeG -ooS 
POSCR «C97. COG=e. 100 =C33 : TZ 
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The most significant observations from the tables are 
summarized as follows: 

For the variable RATE there is zero correlation with the 
PAYGD variable. Thus, the transformation of PRATE to RATE 
did remove the influence of paygrade on promotion rate. 
Similarly, for the variable PRA, both PAYGD and CMF have zero 
correlation. 

As expected, the three promotion rate variables are all 
highly correlated ina positive direction. 

With two exceptions, the correlation values’ for the 
effects and independent variables have similar magnitudes and 
Signs across all three expressions of promotion rate. The 
Li@se exception is the NCOE variable. Under PRATE it is 
negatively correlated with a value of 0.2, and positively 
correlated with lower values for RATE and PRA. This result 
makes sense when one considers that NCOE is highly correlated 
WiteeeAYGD, (0.565). Specifically, raw promotion rates are 
lower for higher grade NCO’s due to time in service and time 
in grade requirements, (-.495). Hence, NCOE, which is highly 
correlated with PAYGD, will also reflect that inverse 
relationship. When the influence of paygrade is eliminated, 
as it is in RATE and PRA, this negative correlation is 
incidentally removed. 

The second exception is for the variable SEX where it is 
positive signed for PRATE and PRA, but negatively signed for 


RATE. The magnitude for all three values are close to zero. 
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An explanation for the difference in sign between PRA and 
RATE will be presented in the analysis of empirical 
distributions and coded scatterplots. 

Groups of closely related variables have generally the 
same correlation across the three promotion variables. 
Specifically, AFQTP, OAFQTP, EIMCAT, and to a lesser extent, 
GTSCR, all demonstrate a strong positive correlation against 
each other, and show the same trend when compared against the 
promotion rate variables. The academic variables HIYRED and 
EDLVL demonstrate similar characteristics, however, EDLVL is 
weaker than HIYRED with respect to the promotion rate 
variables. 

Considering RATE and PRA as the better promotion 
variables to model with, and allowing for only one variable 
from each of the related groups, the six most significant 
correlated variables were selected. These variables, listed 


in descending absolute value of rho, are shown in Table XI. 


TABLE XI Most Significant Correlated Variables 
Considering both RATE and PRA 
Variable Rho Value 
et Y Rew approx On 7 
OAFQTP approx 0.14 


GTSCR approx 0.10 
PQSCR approx 0.09 
RACETH approx -0.06 
NCOE approx 0.006 





These variables, paired either with RATE or PRA, were 


used as the starting basis for multivariate regression 
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analysis. The effects variable SEX was included for 
subcategory analysis in an effort to detect any influence it 
might have on the primary relationships. 
2. Paired Scatter Plots and Simple Regression © 
Plots of paired independent and dependent variables 
were implemented to accomplish two purposes. The first 
purpose was to visually search for any dominant plotting 
patterns. Since the rho values found in the previous section 
are designed to detect only linearity, it is quite possible 
that nonlinear relationships eourd exist between the 
explanatory and dependant variables. For example, if the X-Y 
relationship was strictly Y=X?, a computed rho value should 
be zero. Tous, if one relied only on correlation 
coefficients to detect relationships, he would be misled into 
Tone nag that no relationship existed between the two 
variables. Samp iy plete ings «- 1 scatterplots of the 
explanatory variables with the promotion variables did not 
require specification of the response of the dependant 
variable. Visual observation could then be relied upon to 
detect dominant patterns of any form. These scatterplots 
used two special procedures, LOWESS and Jittering, which will 
be described in analysis of Figures 4.12 and 4.13. 
Secondly, simple least squares regression was performed 
for all variables which had been previously found to be 
Significantly correlated. The simple least squares 


regression procedure yielded a. value called the Coefficient 
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of Determination, or R2ee (R-squareo, R2 is mathematically 
related to the rho, and in the one variable case, the square 
of rho is equal to R2. Thus, R2 can also be used to 
qualitatively interpret the strength of linearity for a 
simple linear model. The advantage of producing R2 values 
was that R2 directly represents the proportion of variance 
accounted for by the assumption of a linear model. The 
results for each of the regressions and an explanation of R2 
will be discussed in analysis of Table XII. 
a. Paired Scatterplots 

Since interpretation of the correlation 
coefficients assumes linearity, visual analysis of pairwise 
scatterplots was used to search for observable patterns, 
linear or otherwise. This visual approach did not require 
interpretation of single derived parameters to identify any 
patterns. 

In producing the scatterplots the LOWESS procedure was 
used. LOWESS, which stands for, Locally Weighted Regression 
Scatter Plot Smoothing, [Ref. 12:pp 94-95] is a nonparametric 
smoothing procedure which is designed to estimate functional 
relationships between Y and kX. In particular, no linear or 
gGuadratic relationship is assumed. For scatterplots of 
discrete variables against the continuous promotion rate 
variables, the discrete variables were Jittered to overcome 
repeated plotting of points. Jittering involves generating 


small random increments, which are then added to the X 
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values. As a result, when the X-Y plot is performed fewer X 
values are repeatedly plotted in the same location, ad. 2 
better visual interpretation can be made of the quantity of X 
values at a discrete level. | 

The overall results of the LOWESS plots showed that the 
predominant pattern was indeed linear. Further, the linear 
pattern was demonstrated most clearly between pairs of highly 
correlated variables. Figures 4.12 and 4.13 demonstrate that 
linearity and the DOWESS and Jittering techniques 
respectively. As a result, linear modelling techniques were 


considered to be the best choice for subsequent analysis. 


LOWESS SCATTERPLOT OF HIYRED VS PRA 
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b. Simple Regression 


For pairs of significantly correlated variables, 


a simple least squares regression plot using PRA as the 
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independent variable was accomplished. The simple least 
squares regression for pairs yields quantitative results in 
terms of slope values, intercept values, tests of the slope 
and intercept values, and the R2 value. 

The R2 value represents what proportion of total variance 
was explained by the simple linear model. As such, its 
values range from zero’ to one. An R2 value of zero would 


indicate that a linear model does not account for any 


variance of the dependent values. Correspondingly, a value 
of zero would be the estimate of the slope of the line. The 
Significance of R2, like rho, is related to sample size. To 


determine the significance of a R2 value, the results of the 
T test for the slope of the model are checked. If the T 


statistic is large and the probability of a greater T value 


-Small, aeoenull hypothesis of a slope of zero is strongly 
rejected. Thus, we can be confident of the linearity of the 
model and the derived slope estimate. Sample size is 


considered in this test because the T statistic is computed 


as a function of sample size. Thus, even with a small R2 
value, if the T test for the slope were significant, the R2 
value would necessarily be held as significant. The only 


qualification for a low R2 value would be that there exists 
considerable ‘noise’ oor unaccounted variance in the response 
of the dependent variable. A summary of results are shown in 


Table XII. 


68 


i i ee 


TABLE XII Simple Least Squares Summary Data 
uSing PRA as Dependent Variable 


Variable Intercept Std Err Slope Std Err R2 ay 
GTSCR Oro oO GOOG Ll) O08 (>. SE -OA ) Poles 13.8 
AFQTP wO2358 (0.014 —) OOOG sa COLO002Z T=. 0LG* 26.1 
Sore es 0 13365061 .6E=02) 0.007 (3.2E-04) SOSS* 22255 
Ercan L mroua. (O2077 ) —-90.003 ~(0.005 _) JOG ie) 
wee O.005 (0.047 “son700l (0,6e8™) . 000 Sere 
EDLVL Oe Ole O20 54) 0.008 (6.008 ) .000 Boe 
NCOE moO OCOD CO. O2 ban OOo (O6008. ) -0G1@ ieee 
Sy) By! Oi CO, OZn oe UO. Ote (Or..024  ) . 000 ee 
CME wOmO2 Sa Ol 6 E—02) O00 (2. 6E-049e 000 9 
moet -O.009  €O.01S $y -0.001 (0.010 ) .000 sel 
PAYGD pOo.045 (Om 09'3 )2) 0.007 (oRO1S ) .000 eo 
PQSCR eOROSI —C5.4E-O2Z) O- 007) (6 -9E-04) sO0825 10.6 


Important observations from the simple paired regression 
analysis are summarized in the following paragraphs. 

Very few sets of pairs result ina significant R2 value. 
Those that do are: GTSCR, OAFQTP, and PQSCR. All three of 
these variables have a positive slope. Analysis of residuals 
for these pairs did show reasonable normality of residuals 
and did not demonstrate any lack of homoscedasticity. 

The remaining variables have a low value positive or 
negative slope. For each of these variables, the 95% 
Confidence Interval for the slope shows the upper or lower 
value of the slope to be either positive or negative. Thus, 
no observable ascending or descending relationship can be 
claimed. 

Using the variable RATE as the independent variable in 


the simple regressions results in the variables EIMCAT and 


69 


AFQTP having measurable R2 values and positive slopes. 

As expected, the results of the simple regression 
analysis coincide with observations taken from the 
correlation table. 

When considered one at a time, there appear to be only a 
handful of variables demonstrating a reportable relationship 
with the promotion variables. The low R2 value for each 
regression indicates either a large proportion of pure error, 
or significant unexplained variance due to other explanatory 
variables not being included. 

3. 3-D Empirical Density, P¥ots 

Three dimensional empirical density plots were used 
to visually check for distribution changes in the continuous 
variables within the subcategories of SEX, PAYGD and RACETH. 
Two such plots will be discussed because they depict visually 
data characteristics identified in earlier tabular results. 
These characteristics were: the application of AFQT 
restrictions by congressional mandate in 1980, and the 
differences in OAFQT scores across racial groups. 

The AFQT restriction is depicted in Figure 4.14, where 
empirical densities for OAFQT are plotted for each paygrade. 
Observing the three densities shows that only the E-7 
paygrade distribution contains scores less than twenty. This 
makes sense, considering that all the E-7 enlistments were 
DriOoFr to 19662 Another interesting observation from this 


plot is that high OAFQT scores become more dominant as 
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paygrade increases. This is most apparent in comparing the 
Emeeecetemiy. tO either the E-5 or E-6. This shift in density 
of OAFQT across the three paygrades suggests that attrition 
tends to manifest itself in the lower AFQT caetgories, but 
that a low AFQT score is, in itself, not prohibitive in 
achieving senior enlisted rank. 

The second 3-D empirical density plot, Figure 4.15, shows 
the differences in renormed AFQT_ scores across racial 
subcategories. A large discrepancy between the white and the 
distribution of black or hispanic races is easily seen, 
although Indians have a similar AFQT to that of whites. This 
observation coincides with the occurrence of different 
promotion rates between different racial categories as well. 
However, to make inferences about promotion policy among 
races would require further research. As pointed out by 
Baul a; CRef. Lippe 7-10] the attrition pattern among 
different racial groups) shifts the averages for both 
promotion rate and AFQT among the races over time. Since the 
purpose of this thesis. is one of prediction, it is more 
important to identify the effect and account for it in the 
model. An explanation as to the cause of this phenomenon 
does not appear to be easily obtained from the thesis data. 

Whee mae important about this plot is that it visually 
demonstrates the correlation between RACETH and OAFQT. If 
OAFQT is a significant determiner of promotion rate, then 


RACETH will be an important covariate. 
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5-D EMPIRICAL DENSITY PLOT 
OAFQT BY PAYGD 


~ 
~™> 
one MO 


eT Tata tan 
’ 
s 
4 


EMPIRICAL DENSITY 
©.G00261.916 


wn a wee 
~/ ~ ™ 
- ~ ~ ~ 
~ = ~~ ~~ ™ 
~_ *» =» = ~ >» * ~ = 
mee a a ae an 
=e ~ ~* ~ (4 


La 
=~ & ~ 
sa >t = 
~~“ = &» = 
~~ ~*~ _» » ~ ~ 
ene eee Tee et eee 
= ~ = | en, eee 
= Sa ee 
eo . a eh Q a” 
2 


ee i 


t 
YAY 
~ ae eK QO ~>™ /y 
Betis s ss ssrst ‘| WAL : } ) 
— mA aw eee TS 4 eae eae ye 
SS ee eee 1 ~ — ? [ 
=" = ™ /j so 
pee ey Ce eee A 
p “ewe SSS FSS 2 Y 
wT TSS ZS 
io & Se a oe ee A 1 
Oo ~§ a ee y 4 En 
a SS 4 4 “ 
= A a a0 


Figure 4.14 


5-D EMPIRICAL DENSITY PLOT 
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Figure 4.15 


D. MULTIVARIATE GRAPHICAL ANALYSIS 


Multivariate graphical analysis consisted of the use of 


Draftsman Plots and Coded Scatter Plots to look for 


relationships when more than two dimensions were under 


consideration. (Ret. 127po- ToS roo One of these 


procedures, the Coded Scatterplot, will be utilized to 


oz. 


demonstrate a Significant data Characteristic, that 
Characteristic being the distribution of SEX, ers. 
tomer wand PRA, in Figure 4.16. 

Coded Scatterplots involved delineating one of the 
effects variables as a third dimension, while plotting an 
independent variable against a dependent promotion variable. 
In Figure 4.16, CMF values were Jittered and plotted against 


the PRA variable, and the plot points were coded as periods 


for males and the letter F for females. 


CODED SCATTERPLOT 
PRA VS CMF WITH SEX 
4 ° 





Figure 4.16 


Figure 4.16 demonstrates the higher density of female 
personnel in the upper. CMF range, which contains the more 
technically oriented career management fields. This 
corresponds to the CMF-SEX correlation coefficient of 0.258 
Foumaeina Table xX. Likewise, the distribution of both the 


female and male PRA scores are symmetric about the zero line. 


ws 


This corresponds to the zero value for the =PRA-SEX 


correlation coefficient also found in Table X. 


Ew - LINBARSMNODELS 
1. Analysis of Variance 

One Way ANOVA was used in this thesis aS an 
intermediate step in defining a final inference model. 
ANOVA’s usefulness has been as an investigative tool to 
detect differences in means among classes of explanatory 
variables. For example, using PRA as the dependent variable 
and EIMCAT as the independent variable, One-Way ANOVA will 
compare and test the equality of the average PRA score across 
the eight levels of EIMCAT, i.e., mental categories one 
through eight. In the testing, the null hypothesis is that 
all eight mental category PRA means are equal, while the 
alternate hypothesis is that they are not. The test 
Statistic used to reject or accept the null hypothesis is the 
F statistic. As such, a large F value, and subsequent 
rejection of the null hypothesis would indicate that there 
exists significant differences between the means of the 
promotion scores for some of the eight mental categories. In 
general, a large F value can be considered to be any computed 
F statistic greater than 3.8, the asymptotic 95 percent point 
for a one degree of freedom model. The nature of these 


differences could be a large discrepancy between a simple 


74 


pair of categories, small discrepancies between all eight 
categories, or any combination of difference conditions. 
Thus, ANOVA has limited value in discerning the location and 
magnitude of the differences between category means, but 1t 
does identify if differences exist and how strong those 
differences are. 

Table XIII tabulates a twelve by three matrix of results 
for separate One-Way ANOVA’s. The rows are the twelve 
explanatory variables and the columns are the three promotion 
variables. Using all three promotion measures as the 
independent variable allowed for a check of ANOVA values and 
trends across those measures. 

In addition to the results of the F test, a value of R2 
is reported. This R2 value is different than that reported 
-in the simple linear regression model. This is because the 
ANOVA procedure considers the independent variable as a set 
of levels, rather than a single continuous variable. With 
One-Way ANOVA, all variables had some level of R2 reported. 
Further, because of the increased informational value of 
variable categories, and hence, more degrees of freedom for 
computation, the values of R2 increased above the simple 
regression reported values. 

It should be noted that technically, when the defined 
continuous variables were put into ANOVA, their values were 
grouped, and then the variables were treated as if they were 


discrete. Because the SAS -software and computational 


aS 


resources used could handle all the integer values for the 
score ranges of AFQTP and the other continuous variables, it 
was possible to gain insiqnt Fiance the existence of 
differences between individual score cells. 

Additionally, nonparametric procedures were used to 
evaluate the relationships. CRef. 13:pp. 250-2557) The 
nonparametric ANOVAS utilized the ranks of the variables and 
also yielded the F statistic for testing the hypothesis of 
equal level means. Having agreement between the parametric 
and nonparametric values removed the need of having to pursue 
confirmation of assumptions for parametric ANOVA. De wile 
also allow analysis of results to focus on the resultant 


values of F and R2 tabulated in Table XIII. 


TABLE XIII One-Way Anova Summary 


Variable PRATE RATE 
R2 Es Re Re 

Sr : -00016 BEC ie -00351 ‘ -00128 
Glybses : .02788 93. -07415 ; 00000 
RACETH é ORE 7 7 . O23 3 .01049 
PAYGD? : - 24953 O., -00000 : -00000 
GTSCR ‘ -04250 3. .03184 ‘ -O2Z6036 
AFQTP : .07046 20. -04623 : -03908 
OAFQTP , .08441 ZS. sO 6As@n1 : .04657 
EIMCAT : ~O1076 Ae .02035 ; .02739 
HIYRED : ~-02950 LCG. -O3272 : .03590 
EDLVL : FOLOZS Des 02035 9 - 02739 
NCOE eis .05097 -02499 ae) ~OLSa6 
PQSCR ° 1.9 mOO3 7.5 6¢ ~01341 8 A@lab alist al 


Afr UNOUNWA hOdW W 


1The Pr>F (level of rejection of the null hypothesis 
of no difference in means) was .0145 for PRATE, .0003 for 
RATE and .0001 for PRA. 

2The Pr>F for PRA is 1.0. 

3The Pr>F for RATE is 1.0, and for PRA is 1.0. 
Values of Pr>F for the remainder of the table were .OOO1. 
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Review of the Table XIII demonstrates some anticipated 
results, which are summarized in the following paragraphs. 

Since the variables PAYGD and CMF were controlled for in 
the derivation of PRA, there is Berea eondinely no 
relationship between those variables and the PRA promotion 
variable. Likewise, the variable PAYGD was controlled for in 
the derivation of RATE, and there was no linear relationship 
demonstrated for that pair. The zero values for the F 
statistic and R2 for those variable combinations documents 
Enis Lact. 

Using RATE or PRA as the dependent variable, and allowing 
for only one, most significant variable to be selected from 


each of the intelligence and academic groups, results in the 


same set of explanatory variables as were found in 
correlation analysis. These variables were: HIYRED, OAFQTP, 
Gls@emyeEOGsck, KACETH, NCOE, and SEX. The most significant 


variables were the ones which had the larger F statistic, and 
R2 value. This set is not ordered, however, since there are 
differences in order between the PRA and RATE models. 

Another interesting development from ANOVA results when 
the explanatory variable mean and variance for each level are 
plotted against the promotion variable. This not a standard 
analytical plot, but it does provide some visual information 
on the size, direction, and dispersion about the center line 
of an independent discrete variable. This plot is most 


similar to a strip box plot for continuous variables. 
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An example plot where each individual’s PRA score was 
plotted against the sum of his EIMCAT and HIYRED score e 
shown in Figure 4.17. In Figure 4.17 the two center lines 
plotted represent the sum of scores for EIMCAT and HIYRED 
seperated between the GED qualified personnel and High School 
Diploma Qualified personnel. The outside two lines trace the 
upper and°* lower bounds one standard deviation from the 


computed means. 


X-Y PLOT OF MEANS AND VARIANCES 
PRA VS HIYRED + EIMCAT 


= UPPER BOUND oo i! 
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Figure 4.17 


By plotting a separate line for each high school diploma 
category it can be seen that while both groups have a similar 
increase in promotion rate, as the combined level of EIMCAT 
and HIYRED increased, the GED qualified personnel were 
consistently a fixed level lower than a fully qualified high 
school graduate. Thus, the additional merit of an actual 
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high school diploma did manifest itself in promotion rate. 

A final look at ANOVA involves specifying a model using 
the set of the seven most significant independent variables, 
and then checking for interactions among them. Table XIV 


gives the results of the Seven-Way ANOVA using this model: 
RATE = 7 Main Effects + Two Way Interactions 


Table XIV depicts the seven most significant variables 
individually in the Main Effects rows, and the interaction 
terms in the Interactions rows. 

The advantage of this Seven-Way ANOVA is that inclusion 
of all of the explanatory variables simultaneously allows for 
comparison of the significance of each of the explanatory 
variables relative to the others. Additionally, specifying 
combinations of two-way interactions checks to see te any two 
of the explanatory variables are significantly related to one 
another. An example of an interaction would be a SEX and CMF 
term. As has been previously shown, female personnel tend to 
be associated with higher CMF values. If the ANOVA model for 
promotion included a term which was the product of the two 
values, SEX*CMF, then the two attributes would be jointly 
considered in the ANOVA model. If the interaction term was 
found to be significant, then the two individual variables 
entries for CMF and SEX would be removed and only the 
interaction term retained. 


An additional consideration in the Seven Way ANOVA was 
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that the model was unbalanced. Unbalanced means that there 
were some combinations of the factor levels which did not 
have any entries in the ANOVA cells. An example of this can 
be seen in the SEX*OAFQT term. Specifically, there are only 
76 degrees of freedom for the interaction term, while the 


individual degrees of freedom for SEX and OAFQT are 1 and 79 


respectively. Thus, the SEX*OAFQT term had three 
combinations without entries. As a result, the F statistic 
computed will be only approximate. Since the purpose of this 


step in analysis was exploratory, the F statistic estimates 
were considered adequate. 

Table XIV presents the results of a Seven Way ANOVA using 
RATE as the dependant variable. Similar results were 


obtained using PRA as the dependant variable. 


80 


TABLE XIV 7-Way Analysis of Variance with Interaction 
DEPENDENT VARIABLE: RATE 


SOURCE pe SSQ MEAN SQUARE F VALUE Ere R2 


MODEL 14966 18869.39 1.260818 1252 0.0001 0.49852 

ERROR 22887 18981.65 0.829364 

CORRECTED ROOT MSE 

iMOMMb 5/7653 37851.04 0.91069421 
SOURCE DF ANOVA SS F VALUE PR > F 
Main Effects 
RACETH 5 807.35 194.69 OZ COUT 
SEX il is 228 GeO OF OC0T: 
OAFQT 79 67 O54 ZO @l 4 eel 
HIYRED 12 236s 5 124.42 OF OC0r 
Grock 93 HOS), 22 1PSe60.5 0.0001 
NCOE lee 945.89 SJ a7s 0.0801 
PQSCR 78 BOT. 52 7.05 O700C1 
Interactions 
RACETH#SEX 5 O-0e OLS LONG, Pe O0e 
SEX*OAFQT Tae 440.59 6.99 0.0001 + 
SEX*HIYRED 9 So. 03 8.85 C2 00OCIs 
SEX*GTSCR Te. 72280 ee 22 0.0999 
SEX*NCOE allt 57 i706 62as 0.0001 + 
SEX*PQSCR 70 3.06 0.91 0.6795 
RACETH*OAFQT 335 0.00 O= 00 1. VOO0 
RACETH*HIYRED 46 107.84 22°36 0.0001 « 
RACETH*GTSCR 326 Ovo Oo. 00 i OO 0.0 
RACETH«NCOE 46 8.41 Ons 2Z 1. OCO8 
RACETH*PQSCR 288 104.24 0.44 im COUO 
OAFQT*HIYRED 593 1 262 O 226 1.06800 
OAFQT*GTSCR 2864 2418.55 OZ ©. 2570 
OAFQT*«NCOE 614 954.24 eo O-OCOl t= 
OAFQT*PQSCR 3631 agers S 1 OG Ores? 
HIYRED«GTSCR 564 igo ee Oe. 28 LO OOO 
HIYRED«NCOE 88 276.98 3,60 O70 0 Ola 
HIYRED*PQSCR 518 484.13 el OF2025 1) 
GTSCR*NCOE 604 7 iloe oe 1.44 O. 0001, 4 
GTSCR*PQSCR 3383 2997.93 leo 0200.51 
NCOE*#PQSCR 542 504.44 lee le Deo 206 


Three important observations can be obtained from Table 
XIV. The first observation is that there are few significant 
interaction terms. Only those terms marked with an asterisk 
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demonstrated statistical significance with the PR > F at 
level .0001. Of these, only three had F values greater than 
3.8. These interaction terms were OAFQTP, HIYRED, and NCOE, 
all interacting with SEX. The presence of interation seen in 
the Seven-Way ANOVA model was previously observed in the 
correlation matrix, Table xX, where SEX was positively 
correlated with HIYRED and OAFQTP, (O.05, and Oeaitaa 
respectively), and negatively correlated with NCOE, (-0.081). 
The implication of having significant interaction terms is 
that they would need to be included in any predictive model. 
Thus, identification or interactions using ANOVA was 
Critical, 

Secondly, all the main effects variables continue to be 
Significant, even when used simultaneously by the model. 

Lastly, selecting the single most significant explanatory 
variable from the academic and education groups yields the 
Same unordered best set as did the One-Way ANOVA: OAFQTP, 
HIYRED, GISCR, NCOE, RACER ander. 

In summary, the fundamental result of ANOVA was the 
confirmation that there are differences in the level means of 
promotion scores due to several independent explanatory 
variables, and an agreement as to which were the best 
explanatory variables when considered separately OG 
simultaneously. 

Also, plotting the means and variances of the sum of 


EIMCAT and HIYRED versus PRA demonstrated that there was a 
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good increasing linear trend of the level means with PRA. 
However, there was considerable variance within each class 
level. The choice of EIMCAT and HIYRED as the explanatory 
variables was important because those eee veS: are both 
discrete representatives from the academic aptitude and 
education groups. 
2. ANCOVA 
The use of One-Way Analysis of Variance in the 
previous section was primarily to confirm the existence of 
Significant differences among the levels of the independent 
variables. Beyond acknowledging that there are some 
independent variables available to explain promotion rates, 
Seven-Way ANOVA did not provide any numerical measure of the 
structural form of the contribution of a given independent 
variable to the model. CRef. L4:p. 10] Ing e2ddaseion, in 
analysis of the continuous variables, the nature of the 
variable was changed to represent a discrete valued variable. 
Incorporating continuous variables 1nto ANOVA was 
achieved through the intermediate method of ANCOVA. ANCOVA 
utilizes metric continuous variables as well as nonmetric 
qualitative values. The result of ANCOVA was an improved 
multivariate model with the inclusion of continuous variables 
in their proper form. ANCOVA provided estimates of the 
linear coefficients for the continuous’ variables, and 


reported on the proportion of variance accounted for by each 
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categorical variable as well. These results provided the 
basis for further removal of variables or interactions from 
the set previously identified. CRef. 15: pp. 343-349) 
The model considered was based on the results of the 
previous chapters and consisted of the following form: 
Promotion = £(OAFQTP, PQSCR,GTSCR,HIYRED, NCOE, RACETH, SEX 
plus interaction terms SEX*HIYRED, SEX*«GTSCR, SEX*OAFQTP) 
The variables OAFQT, PQSCR, and GTSCR are metric and 
continuous, HIYRED and NCOE are discrete and metric, and 
RACETH and SEX are discrete and nonmetric. 
A representation of the model using notation consisted of 


the following form: 


Ys = Be + BiX:. + B2X2 + Bs X3 + Dt + Dat ... Da + Ia 1... Is 


In the above notation, Yi is the promotion variable PRA, 
Bo is the linear intercept, and Bi through Bs are 
coefficients for the continuous variables OAFQT, GTSCR and 
PQSCR. The coefficients Bi through Bs are assumed to be the 
same for all levels of the other variables. Di through Ds 
represent the discrete variables RACETH, SEX, HIYRED, and 
NCOE:. Ti through Is are the interaction terms OAFQT*«SEX, 
HIYRED*SEX, and NCOE*«SEX. 
This model is also unbalanced and the F statistics are 
estimates. The results of the ANCOVA using this model are 


Shown in Table XV. 
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TABLE XV ANCOVA with Interactions 


DEPENDENT VARIABLE: PRA 

SOURCE DF SSQ MEAN SQUARE F VALUE BR. R2 
MODEL 55 2423.68 44.07 47.13 Ome OOnl 0.0642 
ERmeokR 37798 35339.29 0.934 ROOT MSE 

BORK 37853 37762.98 0.966 

TOTAL 


SOURCE DE Oe pei eS > F VALUE PR > F 
Main Effects 

OAFQT 12.89440024 Se .cZ 9 
RACETH 152.10095609 32.54 
SEX 5.31950192 5.69 
Hack ED ay OL/ SP 1L6 46.16 
GTSCR 3.65772995 SO 
NCOE ia2ee3314221 10.93 
PQSCR SO 28563297 .1 S573 
Interactions 

OAFQT*SEX 4.03387863 4.31 
SEX*HIYRED 10.16825209 ee 
SEX*NCOE See 27 ee 1.79 


-OOG2 
OOO 
Oa 
OU 1 
=O47 9 
~-0001 
BOW Om 


O'@ 0 0O°'O' Oo O 


OS 7S 
. 2844 
.0496 


Ooo 


eset) eee): Pe Tie Der ROR OF 
PARAMETER ESTIMATE PARAMETER=0 ESTIMATE 
Pibercerr O.25501 Oe 34 Sa es oele) 1 Oi 
OAFQT 0.00094 iio SoOr7 .00074544 
GTSCR ~0.00104897 ele oc .0479 -00053034 
PQSCR 0.00422902 B26 SOOO L .00045674 





There are three important observations from Table XV. 
First, the main effects variables, with the exception of 
SloGk, ware still significant in their ability to account for 
variance in the model. 

Secondly, no interaction terms are significant. The PR > 
F for these terms are much greater than .0001 and each has a 
small F value. Thus, the effect of the interaction terms 
will be assumed to be negligable. 

Lastly, the bottom portion of the ANCOVA table lists 
estimates of regression coefficients for the continuous 
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variables. These estimates were tested, uSing the T 
statistic, to see if they were significantly different from a 
hypothesized value of zero. If the estimate was not 
significantly different from zero, then the explanatory 
variable did possess sufficient predictive ability. 

The PQ@SCR coefficient has a small, but positive slope 
with a value of 0.0042, and is significantly different from 
zero. The OAFQT variable has a slope with the correct sign 
and magnitude, but it is not significantly different from 
zero. The GTSCR variable demonstrates a negative slope and 
again is not significantly different from zero. 

The negative estimate value, combined with the knowledge 
that GTSCR is strongly correlated with OAFQT, indicated a 
condition of multicollinearity between the two variables. 
Multicollinearity implies that one variable may be simply a 
surrogate for the other with little or no effect as a 
predictor.(CRef. IES ps 415] Thus, the inclusion of GTSCR 
coincident to . OAFQT was considered detrimental to the 
development of a regression model, and it was dropped from 
subsequent analysis. 

In summary, ANCOVA resulted in the elimination of the 
remaining interaction terms from consideration in the 
predictive model. The estimated values of OAFQT and GTSCR 
demonstrated a condition of multicollinearity in the model, 
and the weaker variable, GTSCR, was eliminated. The 


remaining variables to be considered in subsequent analysis 
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were: GnEGt, PO@SCR, “HIYRED, NCOE, SRACETH, and SEX. These 
results were considered satisfactory, in that the remaining 


variable set contains single measures of academic aptitude, 


education, professional education, military performance 
testing, as well as two categorical variables: SEX and 
RACETH. 


Bu: The Final Model: A Multiple Regression (ANCOVA) 


a. Background 

Regression analysis with a reduced set of 
variables was the final step in successive data analyses. 
The important result of this analysis was a set of 
coefficient values which estimated qualitative numerical 
statements about the independent influence of each of the 
explanatory variables. Of specific importance was’ the 
independent influence of OAFQT and HIYRED in predicting an 
individual promotion rate. 


In the development of the regression model this section 


wad: 

1. Review the pertinent results which led to the 
regression model definition. 

Bi, Compare the model using the three promotion rate 
variables. 

3. Select a single promotion variable for the model. 

4. Interpret the resulting regression estimates and 
conduct sensitivity analysis. 

De Check model assumptions and confirm the model using 
an alternate data set and nonparametric procedures. 

6. Test the model by comparing actual versus predicted 


promotion rates for population subcategories. 
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Previous results are reviewed in the following paragraphs. 

ANOVA and ANCOVA demonstrated that Significant 

differences exist between internal levels of the explanatory 
variables as a function of average promotion rates. 

Paired scatterplots utilizing smoothing techniques, and 
plots of the level means found in ANOVA, consistently 
demonstrated an ascending linear pattern when plotted against 
promotion variables. 

ANOVA and ANCOVA models, using interactions, resulted in 
the elimination of variables which did not demonstrate 
sufficient linear additive effect to be included in the 
model. Further, this analysis confirmed that there was no 
Significant interaction among the remaining variables. 
Correlation analysis, combined with the in-depth univariate 
analysis as to the nature and scoring procedures of the 
individual variables, identified groups of variables. In 
Subseguent analysis, these groups were then restricted to 
allow for only the strongest unigue variable to be entered 
into the model. 

The final set of variables for entry into the model are 
the following: 

Promotion = f£(OAFOT, PQSCR,HIYRED, NCOE, RACETH, SEX) 
This model is a mixed scale and variable type model, 
including both discrete and continuous variables. Two of the 
input variables have nominal scale, RACETH and SEX. To allow 


for their entry into the model,. these values were transformed 
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into dummy variables. Specifically, the variable SEX was 
recoded as a O/1 variable, while RACETH was represented with 
five dummy 0/1 variables: Di through DS. For@example, for 
the RACETH score of 1, the dummy variable D1 was coded with a 
1 for every 1 entry and a zero for all others. This 
procedure was applied for the next four levels, while score 6 
was left as a O/O entry. (Ref. 15:pp. 332-341) 

After application of the recoding just described, the 


regression model can be defined with the notation: 


Yi = Bo + BiX1 + B2X2 + BsX3 + BaXa + Di + 1... + Ds + Deo 


In the above notation, Yu: is one of the promotion 
variables, Bo is the linear intercept, and B: and Bz are 
coefficients for the continuous variables OAFQT, and PQSCR. 
Bs and Ba are coefficients for the discrete and ordinal 
variables HIYRED and NCOE. Di through Ds represent the dummy 
variables for RACETH, and De represents the dummy variable 
EOE SEX. 

The data set of 37,854 records was randomly split into 
two separate data files for regression analysis. Tinees 
provided for a different data set to confirm analysis of 
regression coefficients from the first set. Paragraph e.l. 
of this section compares resulting regression coefficients of 
the model using the second data set. 

b. Results 


Table XVI lists the regression results of the 
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basic model variables. When computing models for PRATE and 
RATE the effects variables CMF and then CMF and PAYGD were 
reintroduced into the set of explanatory variables 
respectively. This allowed for comparison of variable 
coefficients and R2 value changes as the dependent variable 
became more restricted. In Table XVI the top paragraph shows 
the ANOVA results of the model and reports the F_ and R2 
statistic. Each column then gives the regression results of 
each promotion rate model, including a Pr>T value as measure 
of the strength of rejection for a null hypothesis of zero 
for the estimate value. Values of Pr>T less than .O5 are 


considered acceptable for consideration of that variable. 
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TABLE XVI Regression Results 


PRATE RATE PRA 

Added Variables CMF, PAYGD CMF None 

ANOVA F 1317.4 360%°3 ZnSe. 

Pro>F -O001 .O001 -O001 

R2 vols .0948 .0546 

Intercept Oma 2 222 eile OSb92 Siee2esga2 
(Stdeerror ) (002558) CAO SS 3680 (205500) 
Pr>T -000 1 OOO -0001 

OAFQT sOO0Ohoo>S ZOOM ool -0042608 
(std error) (00000871) (.0002444) (.0002492) 
Pro>T 20101 .O001 .O0001 

HIYRED -0005341 mlae 352 .139484 
(std error) C2 0001529 (.004851) (.0049298) 
Pr>T .0001 .OO001 OOO W 

PQSCR -000089 -001608 ROOS27 2ici 
(Staeerror ) (.000014) (.000449) (,.0004583) 
Ee >] “OOO ~-oOO] -O001 

SEX - 0008582 -022904 -0564079 
(std error) COOOSOs25) C20 1562 ) C2105 5 31) 
Pro>T .088* ~1427* sOOGsS 

NCOE -00008839 ~OUNzZess -0073740 

CSC Grror ) (-U00DO06G25) C7 0017808) (.0017949) 
Pr>T Sys SOOM: OOO: 

Dl (RACETH) .0026347 MOC USS -01497054 
(std error) (200112386) G03 S653) C7 Gre 65905) 
Pr>T .0196 ~1365* -6808* 

D2 (CRACETH) Jee? ooo 720963520 -0.0898693 
Cataeerror ) ©0071 2663 GOSS5570) (.0363089) 

Pr-T .0008 .0068 sOO1s 

DS )CRACETH ) - .0009404 =,0239592 =—oO 41/7668 
(std error) C..00M2 7 9) —,0402483) G04 22035) 
Pro>T -4623* sooo * noLeo* 

D4 (RACETH) .00028892 -089059 MOLOO747 3 
(std error) (0032534) Crlo2z?7 07 ) (.1048355) 
PEO. CWA Soy soo 50F .9234* 

DS (RACETH) -,000224 =, O25 30 -.0138649 
(std error) C.00ts127) eae57 22 6 I) (.058409) 
Pr>T .9016* .7067* .8124* 

CMF -.000147 POO 5.56/72 NA 
(std error) (.00@0052) (.0001654) 

Pr>T OOo tT BOO I 

D7 (CPAYGD) ,O6012Z7 NA NA 
(Sta error) (.0017904) 
Pr>T 2O001 

D8 (PAYGD) .017999 NA NA 
(std error) (.008774) 
Pret SOOO] 


i 


Observations from the regression table are summarized in 
the following paragraphs. 

The input variables OAFQT, HIYRED, and PQ@SCR all 
maintained a eT cice and statistically Significant 
coefficient value across all three dependent variables. 

The inclusion of PAYGD with the PRATE variable 
significantly increased the 2 value of the model. 
Conversely, the influence of OAFQT, HIYRED, PQSCR, and the 
other explanatory variables was severely diminished. 

The RATE model is very similar to the PRA model, and has 
generally larger estimate values and a higher R2. However, 
the estimates for RACETH and SEX did not have significant T 
values. 

The PRA model, although having a lower R2 value and 
generally smaller estimate values, had an acceptable T test 
result for SEX. Additionally, the PRA model contained one 
less nominal explanatory variable, CMF. The PRA model then, 
has fewer, and more reliable nominal explanatory variables. 
Since the objective of the study was to focus on academic and 
educational measures as predictors of promotion, the PRA 
model was chosen as the most effective predictive model. 
Subsequent analysis of regression coefficient results were 
conducted with the PRA model. 

Cx Interpretation 
Interpretation of the regression coefficients 
will include two points. First, the explanatory variables 
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which can effect the greatest change in the dependent 
Variable wWwdll be identified. Secondly, an example will 
demonstrate the amount of change in a given explanatory 
variable required to achieve a five percent shift i the PRA 
estimate. 

The amount of change in PRA caused by a change of one unit 
OF an explanatory variable can be read directly from the 
regression coefficients. However, the total amount of change 
that an explanatory variable can cause in PRA depends on the 
range of the explanatory variable. Table XVII gives an 
ordered listing of the explanatory variables, excluding 
categorical variables, from most to least total influence as 
measured by Net Possible Change. The net possible change is 
Simply the number of units in the range of the explanatory 


variable multiplied by the coefficient estimate. 


TABLE XVII Net Possible Change by Explanatory Variable 


Variable Range Estimate Net Possible Change 
Hay RED 1 = eZ .13948378 36738 
OAFQT 1-99 .00426083 0.4218 
POSCR 2i-106 ,00327 212 0.25e5 
NCOE 0-14 00737406 Ol TOG 
In a qualitative sense, the sensitivity of PRA to each 


explanatory variable can be demonstrated by deriving the 
number of explanatory variable units needed to move from the 
median PRA value up five percent. 

To compute the average value for PRA, the population 
average for each explanatory variable was entered into the 


on 


regression model. The resulting PRA value was 0.0185, which, 
using the normal approximation, lies at the 50.7 percentile 
of the PRA distribution. An upward shift of 5 percent would 
then require the PRA value to lie at the 55.7 percentile. 


Using the standard normal tables to approximate the PRA 


di striSeucion: the PRA value corresponding to its 55.7 
percentile was 0.1434. Checking the sensitivity of each 
explanatory variable consisted OL changing a_e single 


explanatory variable a sufficient number of units to result 
in a PRA value of 0.1434, while holding all other explanatory 
variables at the population average. Table XVIII tabulates 
the increase of explanatory variable units necessary to 
produce a 5 percent upward shift in PRA percentile. 
Alternatively, if the amount required to reach the 55.7 
percentile was not possible within the range of the input 


variable, the maximum amount of available change was listed. 


TABLE XVIII Sensitivity of PRA to Explanatory Variables 


Variable Average Value Change to Pra % Change 
HIYRED 6 . Oi 726 55.9 
OAFQT 45.3 74.0 BS a7 
NCOE 3.06 14.0% 54.0 
PQSCR 78.4 99.0* 53.4 


*max value 


Interpretation of the coefficient values clearly 
demonstates that HIYRED is the most important explanatory 
variable. This observation is understandable since the 
structure of the variable is discrete, and that changes to 
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adjacent values represents major distinctions in educational 
background. The example of shifting from a value of six cee 
value of seven, represents the difference of having a high 
school degree versus having gone to one year of college. In 
percentages of HIYRED, that constitutes moving from a large 
center group of high school qualified NCO’s, to the upper 
ninety percent of the HIYRED distribution. 
OAFQT is the second most significant explanatory variable. 
A shift of roughly one quarter of its range, i.e. 45 to 75, 
can change PRA plus or minus five percent. The other 
explanatory variables NCOE and P@SCR have considerably less 
influence on the dependent variable. 
Alis Checking of Assumptions 
To verify the requirements) for the regression 
model, residual analylsis was performed using the Grafstat 
program. Representative plots of the OAFQT residual are 


shown in Figures 4.18 and 4.19. 
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The histogram of residuals, shown in Pugin ew 4s, 1a, 
demonstrates that the residual distribution is approximately 
normal. Homoscedasticity is checked in Figure 4.19, in which 
residuals have been plotted against the OAFQT variable. 
There does not appear to be any patterns in the plots of the 
residuals, and the uniform pattern was considered sufficient 
to justify the assumption of homoscedasticity. Lastly, since 
each observation represents a different person, the 


independence of each observation from one another is assumed 


true. 
e. Confirmation of Regression Findings 
(1) Second Data Set. Regression analysis was 
conducted on the second partition of the data set. A 


comparison of those results with the first data set is shown 


in Table XIX. 


TABLE XIX Comparison of Regression Data Sets 


Independent Variable PRA 


lst Set 2nd Set 
Coeff Std eras Coeff St a Ee 
Estimator 
OAFQT -004260 (. 06025) .004729 C.00032) 
HIYRED -139483 (.00493) ~131559 (.00636) 
PQSCR -003272 (.00046) .003197 (.00060) 


The above results are felt to be sufficiently comparable 

to accept the original model coefficient scores. 
CZ) Nonparametric Regression. Since the model 
contained an ordinal variable, HIYRED, a regression result 
using nonparametric terms was included as a confirmatory 
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measure. Nonparametric regression produced the same linear 
least squares approximation for the model estimates, so the 
regression coefficient for HIYRED was still 0.1395. However, 
for nonparametric regression the test for the Gea ot once of 
the estimate value used the Spearman rank correlation 
eeertfricient. The regression coefficient for HIYRED was 
tested using this procedure. 

First, for each value of PRA and HIYRED a predicted value 
U was found by computing U = PRA - (0.1395 * HIYRED). Then, 
the Spearman rank correlation coefficient, rho, was computed, 


based on the ranks of HIYRED andthe ranks of JU. It was 


found to be 0.02482 with a Pr>!IR!i of 0.0001. In this test 
the null hypothesis was the value of the regression 
coefficient was equal to Oe 9S 7 acne value found in 
regression. CRef. 1038p Om 265-27 a To test the null 


hypothesis, that the regression coefficient estimate is 
correct, rho was compared against a rejection region computed 
using the two tailed Spearman Quantile, with a normal 
approximation. The rejection regions for this Spearman 
Correlation parameter were values less than 0.0085 or greater 
Ghani g915. Since the value of rho did not fall inside 
either rejection region, the null hypothesis could not be 
rejected, and a HIYRED regression coefficient of .1395 was 


acceptable. 
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f. Testing the Model 
The model coefficients found by regression were 
tested in two ways. First, a predicted promotion rate value 
was computed for the extremes and average of the model. The 


extreme values used the minimum or maximum values for the 


input variables. The average promotion rate was computed 
uSing sample averages for all input variables. The resulting 
predictions were then be compared against the actual 


distribution percentiles. 
Secondly, subsets of the sample population had average 


promotion rates predicted using categorical values and sample 


population averages. The resulting predictions are compared 
against the actual sample values. Again percentile values 
for PRA were found by using a standard normal table 
approximation. 


TABLE XX Comparison of Extreme and Average Predictions 


Model Data 
Minimum Prediction Sample Percentile 
PRA Value Percentile PRA Value Percentile 
-1.0009 15... 4x -1.558 5% 

(21000) (3.5247 

Maximum Prediction Sample Percentile 
PRA Value Percentile PRA Value Percentile 
1.23029 89.1% 1.7866 95% 
(.4098) (9.9%) 

Average Prediction Sample Percentile 
PRA Value Percentile PRA Value Percentile 
0.01839 S07 -0.04146 50% 
(O2223) (8.5%) 
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The model predictions were very accurate at the average 
level, but this accuracy diminished at the extremes. 

The second test for the model was one where specific 
population subcategories had fee dois average PRA value 
predicted. The subcategories represented were four 
combinations of SEX and the black and white RACETH variables. 
Additionally, predictions were made to check the average 
promotion rate of all NCO’s with a HIYRED value of 10, and 
alieNnGOo"s with an OAFQT™of 85. As in the previous table, 
unless the input variable is being used as a subcategory, its 
value was set to the overall population average. Table XXI 


shows the results of the predictions. 


TABLE XXI Comparison of Predicted vs Actual PRA Averages 


Subcategory Predicted % Sample 4% Sample Size 
(Lower-Upper) 
Male/White 55 e2 5S ae 18,003 


(45.7-64.2) 


Male/Black 49.5 44.3 Zee 
(40235305 40) 


Female/Black 47.3 47 .7 2,485 
C37 27-=56.1)) 


Female/White Bi2a? aoe S 1,842 
(44.1-61.5) 


HIYRED=10 eat Ve Sgial, 96.9 
UG Seoay 9.3) 


OAFQT=85* 57.4 60.2 Zee 
(44.7-69.4) 

*The sample data point estimate was averaged over a 

range of OAFQ@T 80 to 90. 
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Testing of the regression model indicates that it was 
reasonably effective if used with input changes of the 
nominal variables, such as SEX and RACETH. Changes in the 
value of HIYRED produces reliable estimates, and demonstrated 
the considerable contribution of this variable as a predictor 
of PRA. The continuous variable OAFQT is difficult to test; 
Since it is a continuous variable the model estimate was 
taken over a range of values. Predicted results are close to 
the sample value, but the variance of the estimate still 
Spans the median. OAFQT does move the predicted values of 
PRA in the right direction, but its effectiveness is severely 
hampered by its variance and diminishing ability to provide 
an accurate prediction value as PRA approaches either 
extreme. Other prediction estimates were attempted using 
OAFQT and their results demonstrated the same lack of 
predictive ability away from the center percentiles. 

g. Summary of Regression Analysis 
Regression analysis provided estimates of the 
independent CONErLbDUELeNn of several key variables to 
predicting a promotion rate. They include a measure of 
intellgence aptitude, OAFQTP, a measure of academic ability, 
HLYRED? Goa measures of military performance, PQSCR and NCOE, 
and two nominal values SEX and RACETH. 

Testing of these estimates shows that the predictive 
ability of the model is limited to those variables which have 
very distinct abilities to subcategorize the sample 
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population. These variables are the SEX, RACETH, and HIYRED 
variables. The continuous variables for OAFQT, PQSCR, cannot 
be relied upon to independently yield estimates of PRA, but 
can affect limited shifts of the PRA Cae cee within a 


subcategory. 


E. SUMMARY OF FINDINGS 

Chapter IV was the principal analytical exercise in this 
study. It progressed through ascending stages of analysis 
and resulted in an inferential model with a restricted and 
independent set of explanatory variables. These explanatory 
variables did, in fact, rely on levels of intellegence tests 
and academic background as values to predict promotion. 

The model, however, demonstrated only limited utility as a 
preditive equation. It could only match the sample data when 
it was describing an average promotion rate among a large 
population subcategory. This would occur only where the 
change in the explanatory variable had ae significant 
partitioning effect on the population. 

The next two chapters will investigate the relationship of 
intelligence and academic ability as a predictor of promotion 


rate but through different procedures. 


On 


V. ANALYSIS (OF STOP PERECRMERS 


A. INTRODUCTION 

This chapter took an ad hoc approach to identify any 
trends which distinguish top performers, on the basis of 
promotion rate, from their peers. Top performers consist of 
the top three percent of the population, or 1,047 
individuals, according to PRA scores. This data set was 
referred to as the TOP data set, while the remainder were 
referred to as the SAMPLE data set. 

Analysis consists of three sections. The first section 
is a comparative tabulation of means and variances. Results 
shown in this section confirmed the majority of sample 
characteristics predicted in Chapter IV., such as higher 
EIMCAT and OAFQT scores. There were, however, discrepancies 
with respect to TOP distribution values of RACETH, NCOE and 
PAYGD. Those discrepancies are investigated in later 
sections of this chapter. The second section reports the 
results of formal hypothesis testing for differences in means 
between each of the explanatory variables. The last section 
investigates the discrepancies associated with RACETH, NCOE, 
and PAYGD. Through a presentation of graphics demonstrating 
internal shifts of those variable distributions, an effect 
which appears to interrelate the three distributional 


discrepancies is identified. 
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B. COMPARISON OF MEANS AND VARIANCE 

The tabulated means and variances of the study variables 
for the top three percent and for the remainder of the entire 
Sample are presented in Table XXII. The last eons in the 
table shows the percentage and direction that the TOP data 


set differed from the SAMPLE. 


TABLE XXII Top vs Sample Summary Data 


Variable/Type Do a Sample Comment 
Promotion Mean Mean Std Dev 


RATE 2.06 0.00 i200 
PRATE ee S 2057 me nO 
PRA Cro. S 7350 0.00 ro O 


Intelligence 
AFQTP 64. 


OAFQTP Sul. 
EItICaAT 6. 
GIOGR ie 3 
BRE D 6. 
EDLVL ne 
PQSCR 80. 
NCOE aes 


> 
> 
> 
> 
> 
> 
> 
< 


Effects 
SEX 
CMF 
RACETH 
PAYGD 





Observations derived from the data in Table XXII can be 
summarized as follows: 

The four aptitude test variables, GTSCR, AFQTP OAFQTP and 
EIMCAT, all demonstrate a strong positive difference between 
the TOP and SAMPLE scores. The AFQT related scores are about 
twenty percent greater, with GTSCR greater by four percent. 
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The variables, EDLVL and HIYRED, were both positive, with 
HIYRED slightly larger at twelve percent, PQSCR increased 
slightly. 

The effects variables SEX and CMF both increased, with 
CMF demonstrating a significant increase. The change in CMF 
waS an unexpected result of subsetting to the top three 
percent. The PRA variable was designed to be independent of 
CMF, and it should not have been affected as significantly as 
it was. 

The only variables which decreased in proportion between 
SAMPLE and TOP were NCOE, RACETH, and PAYGD. Of the three, 
NCOE was the largest. The change in NCOE was also an 
unexpected result. Regression analysis indicated that NCOE 
had a positive influence on PRA. To have NCOE decrease with 
top performers is the reverse result. Paragraph D of this 


section will attempt to explain the reason for this anomaly. 


Cas SIGNIFICANCE TESTING 

Significance testing for means of the explanatory 
variables between the TOP and SAMPLE data set was included as 
a formal statistical confirmation of differences between the 
two data sets. Testing using nonparametric methods was 
utilized since the study variables were either discrete, or 
if continuous, did not meet the Kolmogorov-Smirnov one-sample 
test fora normal distribueren. The type of nonparametric 
test used is dependent on the type scale of the variable and 
whether it was continuous or discrete. 
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TABLE XXIII Top vs Sample Hypothesis Results 


Variable Test Used Results 


Intelligence 


GTSCR Kruskal-Wallis Test } Chisg = 671 Serongiy 
reject Ho: 
AFQTP Kruskal-Wallis Test Chasqas  tlOSseotrongly 
reject HO: 
OAFQTP Kruskal-Wallis Test Chisgq = 1418 Strongly 
reject HO: 
EecAar 24%C Contingency Table? Chisg = 503 Strongly 


reject HO: 


Wien eD 2XkC Contingency Table Chisg = 931 Strongly 
reject HO: 

EDLVL 2XC Contingency Table Chasd t=. 700 strongly 
reject HO: 

POSCR Kruskal-Wallis Test Chasq = 26.185) Rejyeet HO: 

NCOE 2 x C Contingency Table 

Perec ts 

SEX 2 * C Contingency Table Chisgq = 

eibe 2 * C Contingency Table Chisg = Strongly 
reject Ho: 

RACE IH 2 * C Contingency Table Chisg = Reject HO: 

PAYGD 2 * C Contingency Table Chisg = Strongly 


reject HO: 


1For this nonparametric test the null hypothesis is that 


the populations are identical. The alternate hypothesis is 
that one of the populations yields larger observations. With 
two populations this is equivalent to a Mann-Whitney test. 
At a level « of .95 the critical Chisgquare value for 


Beyeeurem 1S Chisq > 3.84. 


2For this nonparametric test the null hypothesis is that 
the two populations have the same distribution as measured by 
the probability of falling into one of the discrete variable 
Classifications. The alternate hypothesis is that the 
distributions are different. The contingency table is set 
for the two rows to be the classification of PRA > 1.93 and 
PRA < 1.93, the C represents the number of discrete levels in 
the variable being tested. The Chisquare test statistic is 
also used for this test with a rejection of HO: when Chisg 1s 


larger than 3.84 at a .95 level a. 


LOS 


Hypothesis testing confirms the observations made on 
Simple means and variances of the study variables. The 
strength of the difference can be interpretated by the 


magnitude of the Chi-square statistic. 


D. ANALYSIS OF DISTRIBUTIONS 

This section further investigates the shifts in 
distributions for those variables which conflicted with the 
relationships derived in regression and correlation analysis. 
Those variables were CMF, NCOE and PAYGD. Again, the 
conflicts which arose were two-fold. 

First, neither CMF or PAYGD should have been affected by 
Subsetting of the PRA variable. The PRA scores are normalized 
differences from the average score for every paygrade and CMF 
combination. Assuming a uniform application of Senonctren 
policy then, no one CMF or paygrade should have dominated as 
a result of subsetting to the top three percent. Secondly, 
NCOE should have increased slightly rather than decreased 
Significantly by subsetting to the top three percent. 

The three inconsistencies appear to be linked in their 
distributional change. Observation of the three Figures 5.1, 


5.2, and 5.3. demonstrate this. 
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TOP VERSUS SAMPLE CMF CHANGES IN PERCENT 


PERCENTAGE 





Wa 16 Zo 29 54 64 74 81 32 95 
CMF 


Figure 5.1 
Figure 5.1 demonstrates a clearly defined redistribution of 
CMF percentages away from combat arms MOS’s to the combat 
SeEnvece Support MOS’s. In particular Infantry, Artillery, 
and Armor MOS’s lost a total of 15.5 percent, while the 


Administrative Specialists (CMF 71) gained almost 9 percent. 


TOP VS SAMPLE NCOE 
CLUSTER BAR 






ine 
IY SAMPLE 


PERCENT 





5 6 7 B 9 Ome) 1 
NCOE (1-11) 


Fed Git tees ance 


Figure 5.2 demonstrates transfer of a large percentage of 


EO 7 


the sample density away from the NCOE 7 to the NCOE O level. 
This was consistent -with the observations in Figure 5.1, 
since only combat arms NCO’s qualify for level 7, the Combat 


Arms Primary Leadership course. 


TOP VS SAMPLE PAYGD 
CLUSTER BAR 






80 
60 
be 
Zz 
9 @ TOP 
iy 40 ( SAMPLE 
20 \\ 
Ws \ 
IN 
E-6 E-7 
PAYGD 
Paqure™ a. 3 
The last figure, Figure 5.3, shows a displacement of 


percentage from the E-6 to the E-5S paygrade as a result of 
extracting only the top three percent by meaSure of promotion 
rate. 

To offer an explanation of the underlying reason for 
these discrepancies is difficult. Some measure of this 
discrepancy may well be explained in that the removal of 


effects by normalizing the PRA scores was not entirely 


adequate. The observed discrepancy may be simple 
mathematical error. However, it can be noted that their 
interrelationships do act consistently. Specifically, the 
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reduction in paygrade and combat MOS’s both combine to 
Significantly reduce the NCOE level. As such? it is more 
likely that change in NCOE occured coincident with the 
changes in the two variables PAYGD and CMF. The effect being 
demonstrated was one where junior combat service support 


NCO’s were dominating promotion achievement. 


Pee ouUMMARY OF FINDINGS 

Comparing the changes in averages for the top performers 
to the regression coefficients found in Chapter IV, shows 
very substantial agreement. Specifically, OAFQT was the most 
Significant intelligence test variable, while HIYRED was the 
most significant academic variable. Although the percent 
Change in OAFQT is greater than HIYRED, it still has 
considerably more variance than HIYRED. Thus, the predictive 
ability of HIYRED in regression should be more pronounced 
Ehanmemetiace OC  OAFQTP. The less significant variables of 
POSCGRemoExs and RACETH each shifted a small, Significant 
amount in the appropriate direction. 

The only discrepancy between the two procedures is the 
change in the variable NCOE. This change is felt to have 
been induced by changes in the CMF and PAYGD distributions. 
The effect is one where junior combat service support NCO’s 
replace NCO’s from the combat MOS’s. 

An important observation from analysis of the top three 
percent was that the increase in the value of any explanatory 
variable was not extreme. In fact, the largest increase was 


Way) 


only twenty-five percent. As an inference, it appears that 
NCO’s who do alittle better in a combination of areas, 
rather than much better ina single area, are more likely 


recipients of faster promotion rates. 
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Vij eRINCIPAL, COMPONENTS AND FACTOR ANALYSIS 


Ay INTRODUCTION 

In this chapter more advanced statistical procedures are 
implemented to better summarize the independent variables, 
and improve or at least simplify the cause-effect model. 
Principal components and factor analysis are two closely 
related procedures which are normally used in investigating 
the mutual relationships and communalities of a large number 
of variables. By identifying redundant variables, and by 
constructing composite variables of the originals, iter lS 
possible to reduce the number of independent explanatory 


variables to only those which are significant and unique. 


Bae HEOR Y 
Principal components and factor analysis each use matrix 
Gegeara es towroperate on a P by P matrix of correlation or 
covariance coefficients and produce a system of eigenvectors 
Ge =the fLorm: 
Ween = Arg 6Cl +l faa yp Xz UFC AD Ke «Ut UE. lteethne notation, Yes) 
represents the resultant composite variable which is the 
Miinearecammemnation of the loading coefficients, ai;. These 


Poadengmeoctitcients multiply each of the original variables 


hae n= ls. Dp. E represents the amount of residual error not 
accounted by the linear model.lCRef. Shee $202 The 
resulting eigenvectors represent aeesSece ot orthogonal: 
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components jointly perpendicular in the space of the original 
variables. (Ref. 15:p. 424] These components are jointly 
uncorrelated and individually account for levels of variance, 
where the first principal component accounts for the largest 


proportion, and the last principal component accounts for the 


smallest. A resulting component may be representative of 
some aggregate characteristic of the original Vineuse 
variables. For example a resulting eigenvector which has 


strong factor loadings for original variables of physical 
strength and endurance could be called a factor of stamina as 
an aggregate measure. Principal components and factor 
analysis differ in that principal components assume and 
require that number of components equal to the number of 
initial variables is needed to account for the total 
variance. In contrast, the factor method assumes that there 
exists a set of composites in a dimension smaller than the 
dimension of the original number of variables which will 
Suffice. tReGiwe 5: Pp. eee) 

An additional aspect of factor analysis is that it allows 
for rotation of the solution with the intent of developing 
more unique and well-defined components. For example if 
there are five variables ina factor which have intermediate 
loading factors in the range .2 to .4, a rotation of common 
factors by applying nonsingular linear transformations may 
result in a pattern matrix in which the loadings are either 


zero or close to one. The end result is ea ier to interpret 


ae 


than the factor with numerous mixed elements. Graphical 
measures are useful with the rotation procedure and allow the 
analyst to see the relative uniqueness of the input 


variables. 


Coes eouULTS 

The SAS procedure for performing factor analysis was used 
with the method of factor determination being the principal 
component method. As such, basic principal component 
analysis was conducted, but limits were applied on the number 
of factors retained so that only the most significant 
composite factors would be kept. The first set of input 
variables included all of the twelve study variables. Table 
XXIV shows the resulting factor solution. Appended below 
each component iS an interpretation explaining what the 
aggregate factors represent. The original input variables 
which contributed most to the factor have been underlined. 
Following Table XXIII is a factor plot, Figure 6.1, where 
each of the variables is coded by a letter. By observing the 
plot, any lack of uniqueness for a group of variables can be 


noted where the coded letters are close to one another. 


is 








TABLE XXIV Principal Components Tabular Results 
Input Matrix of correlation coefficients 
PRIOR COMMUNALITY ESTIMATES: ONE 
iL 2 3 4 5 6 
EIGENVALUE 4.0052 1.7334 1.4979 1.0634 0.8496 0.8028 
DIFFERENCE 2.2717 0.2355 0.4344 0.2138 0.0468 0.0486 
PROPORTION 0.3338 0.1445 0.1248 0.0886 0.0708 0.0669 
CUMULATIVE 0.3338 0.4782 0.6031 O-. 69109077525 0.8294 
8 9 10 LA bZ 
EIGENVALUE 0.5392 0.3500 0.2809 0.1196 0.0034 
DIFFERENCE 0.1892 0.069@8)0.1613 >02i62 
PROPORTION 0.0449 0.0292 0.0234 0.0100 0.0003 
CUMULATIVE 0.9372 0.9663 0.9897 0.9997 1.0000 
7 FACTORS WILL BE RETAINED BY THE NFACTOR CRITERION 
FACTOR PATTERN 
FACT1 FACT2 BACT S FACT4 FACES FACT6 
EDLV Er" 4362 eksierl ~9024 =—.25449 »-2.C0674 - 70693 
APOT Pie 9515 igs 3 sie > . 06375-20075 .1548 
EIMCAT .9060 -1220°, =21652 -—. 0596 see e09c .1478 
NCOE 53>. 0085 ~4507 .6668 2927 =,0sge .0084 
HIYRED 3834 .6410 14176 =2326 1 Ooo eee Oose 
SEX 1735 BAe Weel Vis Cole ~K657 -=.0736 
OAFOT 9518 LOS GRRE loo -OS9Omer— 20092 ESOS 
GiSser 7 7820 sl126 .C090 .0331 -.0464 wea 0 
PQSCR 4001 ~2413 -1205° = 1 USOp —., Ae 2. Ga 4527 
CMF L677 -5200 -.1449 49858 ==. 1.1 Sa 2537 
PAYGD .1216 - 3467 O77 O ~3367  -.1816 -.08435 
RACETH=. 3590 3130 ~2547 wl223 .4708 -6567 
Intell Acad Career Sex PQSCR RACE 
Tests Status 
FINAL COMMUNALITY ESTIMATES: TOTAL = 10.706622 
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O00 0 


~7542 
2149 
- 0628 
soo 22 


FACT7 


O29 


.024 
Oa 
.134 
.124 
2550 
1028 
132 
115 
- 261 
Lom 
P2816 
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PLOT OF FACTOR PATTERN FOR FACTOR1 AND FACTOR3 


FACTOR1 
EBay | 
GeG rT. 
or 
7 
ZO 
2 A 
-4 I E 
2 In 
ete A 
saul K C 
isin e= Oa Oe. | |CUO! wh et wot fC OlUCWO UD) CBD 
sgl af 
Se O 
a L R 
ee 3 
eS 
ar 
- 7 
cao 
mo 


ak 
EDLVL=A AFQTP=B EIMCAT=C NCOE=D HIYRED=E SEX=F 
OAFQT=G GTSCR=H PQSCR=I CMF=J PAYGD=K RACETH=L 


PaigtiEe..6... 1 

The results appear to quite reasonable, where the most 
significant factor is a composite of all the mental aptitude 
measures: OAFQTP, AFQTP GTSCR, and EIMCAT. The second 
factor consists primarily of academic performance measures 
EDEL and HiYRED. The third factor is composed of NCOE and 
PAYGD and reflects two closely related measures dominated by 
paygrade. The fourth factor is predominantly a measure of 
SEX and two other nominal variables, CMF and PAYGD. The 
fifth, sixth and seventh factors all appear to be dominated 
by single variables, PQSCR, RACE, and CMF respectively. 


PS 


In short, each of the original twelve variables is in 
some measure represented in the five factors, the first five 
factors accounting for over seventy five percent of the 
variance. By observing the entry for PROPORTION one can see 
that the subsequent seven factors each contributed between 
-0668 to .0028 of the variance and as such are not major 
COnNErIbDUGcors: 

Using the results of the first solution a second analysis 
was conducted with a reduced number of input variables. In 
each of the initial solution factors the single variable 


having the largest loading factor was selected and the other 


related variables were eliminated. Table XXI shows the 
results of that solution, and Figure 6.2 shows the Factor 
PYOee 


iS 


TABLE XXV 


Reduced Principal Components Tabular Results 


PRIOR COMMUNALITY ESTIMATES: 
Input Matrix of correlation coefficients 








ONE 





1 2 3 4 5 7 

PEGENVALUE 2.1666 1.2063 2.0019 0.8703 0.8049 0.7081 0.2416 
DIEPPERENCE 0.9602 0.2044 0.1315 0.06540.09670.466S 

Been Ornm lon O.3095 3021723 0.1431 0.1243 0.1150 0.10120.034S 
CUMULATIVE 0.3095 0.4819 0.6250 0.7493 0.8643 0.96551.0000 

7 FACTORS WILL BE RETAINED BY THE NFACTOR CRITERION 

FACTOR PATTERN 
rene i Benes FACT3 FACT4 FACTS FACTS FACT?7 
NCOE poe el -,5422 ~6941 -2656 -.3801 -.1071 .018 
HIYRED . 3659 oso 7335 -5162 -.2443 -.4001 -.004 
SEX Ise 265.32 .1514 .6993 0899 -~.1346 -.051 
OAFQT .8945 .0404 .0412 2502 —=.0668 .2462 -.328 
GTSCR .8592 Ose 4 .0154 Mea o2 J. 259 - 3664 -.328 
PQSCR .5069 <370 7 eos <(6(enbe' -7/141 -.2648 -.022 
RACETH -.4521 -o275 20799 .1589 - 2487 oo 1. .037 
Intell Acad NCOE SEX PQSCR Race 
Tests 
FINAL COMMUNALITY ESTIMATES: TIOTAL = 7.000000 
NCOE HIYRED SEX NOAFQT SS er PQSCR RACETH 
PaO OMO.s1, O10 00 leo iale 10000 LOO OO 1000s fT -OC80 
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PLOT OF FACTOR PATTERN FORK FACTOR POAND HAC EGR 


FACTOR1 
i 
Ee 3D 


| 
MPMmoOW bh Ug) © 


29-.8-.7>.6A . 9=36e Soe OF 1 62 eee > 6 eee 


NDWOHAXY 


: 
WOONA UNANWNEH 


= 


NCOE=A HIYRED=B SEX=C OAFQT=D GTSCR=E PQSCR=F RACETH=G 


Figure 6.2 Factor se 1ot 

Restricting the input to the strongest unique variables 
results in an almost complete separation into single factors. 
The only exception is the grouping of GTSCR and OAFQT, (E and 
D). Thie is not suprising considering tne composition wes 
both scores from the same set of tests in the ASVAB. Thus, 
the decision to eliminate GTSCR from earlier regression 
models makes sense from the Factor AnalySis perspective as 


well. 
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E. SUMMARY OF FINDINGS 

ine” wapolication of principal components and factor 
analysis confirmed many of the patterns of dependency and 
redundancy with the study variables. It confirmed the 
choices for unique variables in the regression as developed 
in Chapter IV, and gave a good second opinion for deciding 
which variables could be set aside with little effect on the 


model. 
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VIL. CONCLUSION 


A. OVERALL FINDINGS 

There is strong statistical evidence to support the 
proposition that success in the Army, aS measured by 
promotion rate, is related to the individual’s intelligence 
test scores and previous academic background. The 
explanatory variables of the 1980 normed AFQT score and the 
individual’s highest year of education at time of entry are 
the most important indicators for a future promotion rate. 
The highest year of education at time of entry is the more 
important measure, but changes in acs discrete scale 
represents very substantial changes in academic background. 
OAFQT is not nearly as important as HIYRED and can 
independently affect the predicted promotion rate only up to 
ten percent. 

While in service, how well the individual scores on his 
Performance Qualification Test Scores and his attendance at 
NCO schooling will be indicative of a faster promotion rate. 

The statistical evidence for these observations can be 
argued by showing the existence of significantly increasing 
promotion rate averages across ascending levels of 
explanatory measures in ANOVA and ANCOVA analysis. This 
argument can be supplemented, and those differences seen more 


concretely, by a simpler comparison of top performers verses 
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the sample averages. 

Considerable variance of promotion rate exists across any 
of the levels of the discrete explanatory variables, and 
within any of the categorical variables. There is a dilemma 
inp» designing an effective dependent variable. While 
controlling categorical variables such as CMF and Paygrade, 
the effects of the other variables become more apparent and 
Significant. However, the ability of the model to explain 
variance is significantly diminished. 

Seleeceing — a set of the most important and unique 
explanatory variables was achieved via two methods. A 


successive, increasing dimension procedure distilled a set of 


unique explanatory variables. This method relied upon 
developing detailed familiarity with each variable. In the 
process hypothesis testing was used to eliminate 


Pnslaumlnacantey —CONeCributors and» identify the most Tmportant 
variable from a group of related variables. This restricted 
set of explanatory variables was confirmed with the use of 
principal components, a method which uses a mathematical 
approach to identify orthogonal and unique variables. 

When using inferential procedures the resulting model 
met regression assumptions, both parametrically and 
nonparametrically. Purthner, the model estimates are 
reproducable with an alternate data set. 

Although the model is technically acceptable, it is only 


accurate in predicting promotion values for population 
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subcategories. The low R2 value and high mean square error 
terms found during regression were manifested in model 
testing. When making predictions based on incremental 
Changes in AFQT the sample data values were close, but upper 
and lower bounds were so large that resulting predictions 
were not usefull. 

The poor performance of the predictive model can be 
attributed to two possible reasons. First, that there exists 
some unspecified predictor variable which could be used to 
better account for variance. Or secondly, there exists 
Significant inexplicable chance in the occurance of a 
promotion rate for any given individual. 

In the case of the first reason, it should be observed 
that the number of available entries held on ae given 
individual at either DMDC or MILPERCEN is limited. Of the 
one hundred and forty data fields, this study considered all 
entries which were felt to have potential merit as an 
explanatory variable. This included several versions 
expressing the same fundamental quality. Of the twelve 
variables considered the final number of significant 
variables was reduced to only six. Overall, there are few 
Significant and unique measures available to use as 
predictors. To discover additional explanatory variables 
would require sarauweannere of new personnel data elements in 
those data bases. Pot ntial candidates include evaluation 


report averages, or pe sibly, the results of a personality 
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composite test. Alternatively, the quality of information on 
academic performance could be increased, such as the 
inclusion of grade averages from high school attendance 
periods. The utility of this additional data would then have 
to be evaluated in a manner similar to this thesis. 

The second reason given for error iS a more probable 
explanation, for the subject matter of this study is people, 
and not a more deterministic physical phenomenon. The 
resolution of a cause effect relationship is more subtle and 
more difficult to verify. Although this condition does not 
have a mathematical remedy, the judgement of whether or not 
even a small, highly variable measure of trend is sufficient 
still lies with the analyst and his ability to present that 


judgement to decision makers. 


B. POLICY RECOMMENDATIONS 

The first question that must be answered in this section 
is whether or not having a predictive model is necessary to 
make policy decisions regarding promotion or accession. The 
answer offered in this document is that it is not. There is 
sufficiently reliable information resulting from hypothesis 
testing and subpopulation analysis Oo make cogent 
observations and decisions with. 

From the results of this investigation, accession policy 
makers should closely manage the two attributes of OAFQT and 
HIYRED. This recommendation is more a confirmation, rather 
than a proposal. The 1984 Defense Authorization Act already 
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places constraints on AFQT category and high school diploma 
status. 

The two in-service attributes that should be managed are 
the Performance Qualification Score, and attendance at NCO 
schooling. To directly tie scores on these attributes in the 
form of promotion points or a minimum threshold scale would 
be one approach. Unfortunately, this may artificially force 
NCO’s of less potential and aggressiveness into categories 
with the more competent individuals. The result may be a 
lessening of the discriminatory effectiveness of the two 
measures. 


If the individual were allowed to achieve his or her 


score and pursue in-service education independent of 
promotion pelicy, the ability of these variables to 
discriminate would be better. However, not tying these 


scores directly to promotion points values or thresholds 
should not mean that either measure would be unused. A 
policy where promotion boards were still instructed to review 
an individual’s scores, inclusive with notification of this 
review policy to the NCO population allows for self selection 


by the more ambitious individuals. 


C. SUGGESTIONS FOR FURTHER RESEARCH 

One disturbing observation of this study was the apparent 
disparity among race and ethnic groups in terms of AFQT and 
promotion rates. As pointed out by Daula (19850 the 
explanation of this disparity cannot be seen in an aggregate 
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promotion data approach, but rather, a duration model 
approach with a set group of individual soldiers over 
time.CRef. ll:pp. 7-9] His paper reports that this disparity 
toa  Geswiec Of ater C1On. Specifically, the shifting of 
Subcategory promotion averages is a result of different 
retention patterns among race and ethnic groups, and not due 
to a racialy sensitive promotion system. 

A study to determine the magnitude and underlying reasons 
for the different retention patterns, and to test this 


hypothesis, would have considerable merit. 
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APPENDAt en 
CAREER MANAGEMENT FIELDS AND FREQUENCIES 


CUMULATIVE CUMULATIVE 


MOSNAME lye FREQUENCY PERCENT FREQUENCY . PERGENG 
Infantry 11 4320 11.4 4320 11.4 
ChE Ene rncer ware LOS oO Zed 3350 14.1 
Artillery 13 27 310 PaeS 8130 25 
Air Defense 16 Sioa Zac 89 3 1 Za, 
Special Ops 18 244 0.6 9225 24.4 
Armor 19 2434 6.4 L659 3ORa 
Hawk Missile 23 187 OisS 11846 Seer 
Nike Missile 27 SoZ 0.9 12198 Sa2 
Tac Radar 28 40 Ord 12238 32.3 
Tac Radar 29 625 stra 12863 34.0 
Communications. oZ6> ore 16128 42.6 
Elect Warfare 33 30 ‘eo emik Tol58 42.7 
Techn "Drakter Si 619 ie O77 7 44.3 
Chem Warfare 54 529 1.4 17306 45.7 
Explosive Ord 55 400 ea 77 O16 46.8 
Repair 63 3766 9.9 21472 307 
Cargo Spec 64 1041 Zee 225s 5S io 
A/C Repair 67 1090 2.9 23603 62.4 
Admin Spec ae 8020 80 26024 7043 
Programmer 74 423 me 27046 71.4 
Sup ode 76 Zo 77 Tee ZO7 23 Logs 
Recruiter 79 106 Oa 29829 70.8 
Tepe Eng BT 65 OZ 29894 71920 
AV Spec 84 1 S57, 0.4 SiO: 79.4 
Medical 91 2498 6.6 32549 86 .0 
Lab Spec 92 444 ieee 32993 S77 
Ate [rakize 93 7S O65 Sood 87.6 
FOOd move 94 919 2.4 34087 90.0 
Mil Police 95 1674 4.4 397601 94.5 
Intelligence 96 789 Ok 36550 96216 
Musician 97 L7G O85 36726 97.0 
EW/SIGINT 98 ee Sr) 37651 eOe® 
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